How do I output the results of a HiveQL query to CSV?

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

Thanks!

edited Oct 28 '14 at 16:17

Vldb.User

194

asked Aug 8 '13 at 15:07

AAA

95451736

add a comment |

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

Thanks!

edited Oct 28 '14 at 16:17

Vldb.User

194

asked Aug 8 '13 at 15:07

AAA

95451736

add a comment |

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

Thanks!

edited Oct 28 '14 at 16:17

Vldb.User

194

asked Aug 8 '13 at 15:07

AAA

95451736

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

Thanks!

database hadoop hive bigdata hiveql

edited Oct 28 '14 at 16:17

Vldb.User

194

asked Aug 8 '13 at 15:07

AAA

95451736

edited Oct 28 '14 at 16:17

Vldb.User

194

asked Aug 8 '13 at 15:07

AAA

95451736

edited Oct 28 '14 at 16:17

Vldb.User

194

edited Oct 28 '14 at 16:17

Vldb.User

194

edited Oct 28 '14 at 16:17

Vldb.User

194

asked Aug 8 '13 at 15:07

AAA

95451736

asked Aug 8 '13 at 15:07

AAA

95451736

asked Aug 8 '13 at 15:07

AAA

95451736

add a comment |

12 Answers
12

active

oldest

votes

138

Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.

According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.

Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll

total 4

-rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0

[lvermeer@hadoop temp]$ head 000000_0 

"row1""col1"1234"col3"1234FALSE

"row2""col1"5678"col3"5678TRUE

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ',' 

select books from table;

Hope that helps.

answered Aug 9 '13 at 7:44

Lukas Vermeer

4,68221119

3

Thank you!! The hive -e approach worked perfectly!

– AAA
Aug 9 '13 at 17:46

2

do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

– fd8s0
Nov 5 '14 at 14:56

Is it possible to export the data in HDFS as Sequence file format?

– Nageswaran
Jul 27 '15 at 6:59

1

I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

– Peter Cogan
Dec 3 '15 at 19:22

@lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

– ML_Passion
Feb 23 '17 at 18:14

|
show 1 more comment

If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):

hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv

edited Jan 23 '14 at 19:43

dee-see

19.1k44579

answered Jan 23 '14 at 19:22

David Kjerrumgaard

38123

4

Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

– jatal
Oct 27 '14 at 18:04

This is the cleanest solution

– Dutta
Sep 21 '15 at 17:08

worked best thanks

– Amrita Sawant
Feb 1 '16 at 18:24

1

This failed for me for e.g., a date time string that had a space between date and time.

– williaster
Dec 15 '16 at 21:43

@williaster sed 's/t+/,/g' this should help for this issue.

– Sudhakar Chavan
Jun 1 '17 at 11:44

|
show 1 more comment

You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.

You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.

answered Aug 8 '13 at 17:23

Olaf

5,77011437

This approach is best if one wants to use output in a subsequent oozie pipeline step.

– cerd
Apr 13 '14 at 21:30

add a comment |

If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.

answered Jul 29 '14 at 18:00

Ray

8128

add a comment |

I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.

To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:

hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'

This created a perfectly valid csv. Hope this helps those who come looking for this solution.

answered Aug 27 '15 at 0:49

sisanared

2,1111331

It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

– Josh
Jun 2 '16 at 14:44

How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

– Naresh S
Apr 23 '18 at 8:30

@NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

– sisanared
May 13 '18 at 18:10

@sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

– Naresh S
May 14 '18 at 8:42

@NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

– sisanared
May 18 '18 at 4:39

|
show 1 more comment

You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )

for ex:

hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv

edited Jun 27 '16 at 7:26

answered Apr 8 '15 at 21:03

Ram Ghadiyaram

16.5k64477

add a comment |

You can use INSERT … DIRECTORY …, as in this example:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'

SELECT name, salary, address

FROM employees

WHERE se.state = 'CA';

OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.

edited Jun 27 '16 at 7:42

jotik

8,65963792

answered Jun 29 '14 at 7:38

bigmakers

462

add a comment |

I had a similar issue and this is how I was able to address it.

Step 1 - Loaded the data from Hive table into another table as follows

DROP TABLE IF EXISTS TestHiveTableCSV;

CREATE TABLE TestHiveTableCSV 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ','

LINES TERMINATED BY 'n' AS

SELECT Column List FROM TestHiveTable;

Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension

Start-AzureStorageBlobCopy

-DestContext $destContext 

-SrcContainer "Source Container"

-SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"

-DestContainer "Destination Container"

-DestBlob "CSV/TestHiveTable.csv"

edited Jun 30 '17 at 12:48

CalvT

2,17942739

answered May 29 '14 at 14:24

Dattatrey Sindol

637

add a comment |

The default separator is "^A". In python language, it is "x01".

When I want to change the delimiter, I use SQL like:

SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table

Then, regard delimiter+"^A" as a new delimiter.

edited Jun 27 '16 at 7:59

Ram Ghadiyaram

16.5k64477

answered Nov 6 '13 at 22:27

moshaholo

1519

add a comment |

Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.

answered Jan 3 '18 at 13:35

schoon

78121233

add a comment |

In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.

It will:

pbrun.

kinit.

beeline (with your query).

Save echo from beeline to a file on Windows.

Execute it like this:

set PROXY_HOST=your_bastion_host



set SERVICE_USER=you_func_user



set LINUX_USER=your_SOID



set LINUX_PWD=your_pwd



python hh.py --query_file=query.sql

answered Nov 21 '18 at 14:57

Alex B

718719

add a comment |

I tried various options, but this would be one of the simplest solution for Python Pandas:

hive -e 'select books from table' | grep "|" ' > temp.csv



df=pd.read_csv("temp.csv",sep='|')

You can also use tr "|" "," to convert "|" to ","

answered Dec 11 '18 at 21:34

notilas

4811724

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18129581%2fhow-do-i-output-the-results-of-a-hiveql-query-to-csv%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

12 Answers
12

active

oldest

votes

12 Answers
12

active

oldest

votes

138

According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.

Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll

total 4

-rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0

[lvermeer@hadoop temp]$ head 000000_0 

"row1""col1"1234"col3"1234FALSE

"row2""col1"5678"col3"5678TRUE

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ',' 

select books from table;

Hope that helps.

answered Aug 9 '13 at 7:44

Lukas Vermeer

4,68221119

3

Thank you!! The hive -e approach worked perfectly!

– AAA
Aug 9 '13 at 17:46

2

do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

– fd8s0
Nov 5 '14 at 14:56

Is it possible to export the data in HDFS as Sequence file format?

– Nageswaran
Jul 27 '15 at 6:59

1

I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

– Peter Cogan
Dec 3 '15 at 19:22

@lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

– ML_Passion
Feb 23 '17 at 18:14

|
show 1 more comment

138

According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.

Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll

total 4

-rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0

[lvermeer@hadoop temp]$ head 000000_0 

"row1""col1"1234"col3"1234FALSE

"row2""col1"5678"col3"5678TRUE

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ',' 

select books from table;

Hope that helps.

answered Aug 9 '13 at 7:44

Lukas Vermeer

4,68221119

3

Thank you!! The hive -e approach worked perfectly!

– AAA
Aug 9 '13 at 17:46

2

do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

– fd8s0
Nov 5 '14 at 14:56

Is it possible to export the data in HDFS as Sequence file format?

– Nageswaran
Jul 27 '15 at 6:59

1

I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

– Peter Cogan
Dec 3 '15 at 19:22

@lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

– ML_Passion
Feb 23 '17 at 18:14

|
show 1 more comment

138

According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.

Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll

total 4

-rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0

[lvermeer@hadoop temp]$ head 000000_0 

"row1""col1"1234"col3"1234FALSE

"row2""col1"5678"col3"5678TRUE

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ',' 

select books from table;

Hope that helps.

answered Aug 9 '13 at 7:44

Lukas Vermeer

4,68221119

According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.

Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll

total 4

-rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0

[lvermeer@hadoop temp]$ head 000000_0 

"row1""col1"1234"col3"1234FALSE

"row2""col1"5678"col3"5678TRUE

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ',' 

select books from table;

Hope that helps.

answered Aug 9 '13 at 7:44

Lukas Vermeer

4,68221119

answered Aug 9 '13 at 7:44

Lukas Vermeer

4,68221119

answered Aug 9 '13 at 7:44

Lukas Vermeer

4,68221119

answered Aug 9 '13 at 7:44

Lukas Vermeer

4,68221119

3

Thank you!! The hive -e approach worked perfectly!

– AAA
Aug 9 '13 at 17:46

2

do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

– fd8s0
Nov 5 '14 at 14:56

Is it possible to export the data in HDFS as Sequence file format?

– Nageswaran
Jul 27 '15 at 6:59

1

I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

– Peter Cogan
Dec 3 '15 at 19:22

@lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

– ML_Passion
Feb 23 '17 at 18:14

|
show 1 more comment

3

Thank you!! The hive -e approach worked perfectly!

– AAA
Aug 9 '13 at 17:46

2

do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

– fd8s0
Nov 5 '14 at 14:56

Is it possible to export the data in HDFS as Sequence file format?

– Nageswaran
Jul 27 '15 at 6:59

1

I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

– Peter Cogan
Dec 3 '15 at 19:22

@lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

– ML_Passion
Feb 23 '17 at 18:14

Thank you!! The hive -e approach worked perfectly!

– AAA
Aug 9 '13 at 17:46

do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

– fd8s0
Nov 5 '14 at 14:56

Is it possible to export the data in HDFS as Sequence file format?

– Nageswaran
Jul 27 '15 at 6:59

I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

– Peter Cogan
Dec 3 '15 at 19:22

@lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

– ML_Passion
Feb 23 '17 at 18:14

|
show 1 more comment

If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):

hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv

edited Jan 23 '14 at 19:43

dee-see

19.1k44579

answered Jan 23 '14 at 19:22

David Kjerrumgaard

38123

4

Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

– jatal
Oct 27 '14 at 18:04

This is the cleanest solution

– Dutta
Sep 21 '15 at 17:08

worked best thanks

– Amrita Sawant
Feb 1 '16 at 18:24

1

This failed for me for e.g., a date time string that had a space between date and time.

– williaster
Dec 15 '16 at 21:43

@williaster sed 's/t+/,/g' this should help for this issue.

– Sudhakar Chavan
Jun 1 '17 at 11:44

|
show 1 more comment

If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):

hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv

edited Jan 23 '14 at 19:43

dee-see

19.1k44579

answered Jan 23 '14 at 19:22

David Kjerrumgaard

38123

4

Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

– jatal
Oct 27 '14 at 18:04

This is the cleanest solution

– Dutta
Sep 21 '15 at 17:08

worked best thanks

– Amrita Sawant
Feb 1 '16 at 18:24

1

This failed for me for e.g., a date time string that had a space between date and time.

– williaster
Dec 15 '16 at 21:43

@williaster sed 's/t+/,/g' this should help for this issue.

– Sudhakar Chavan
Jun 1 '17 at 11:44

|
show 1 more comment

If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):

hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv

edited Jan 23 '14 at 19:43

dee-see

19.1k44579

answered Jan 23 '14 at 19:22

David Kjerrumgaard

38123

If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):

hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv

edited Jan 23 '14 at 19:43

dee-see

19.1k44579

answered Jan 23 '14 at 19:22

David Kjerrumgaard

38123

edited Jan 23 '14 at 19:43

dee-see

19.1k44579

edited Jan 23 '14 at 19:43

dee-see

19.1k44579

edited Jan 23 '14 at 19:43

dee-see

19.1k44579

answered Jan 23 '14 at 19:22

David Kjerrumgaard

38123

answered Jan 23 '14 at 19:22

David Kjerrumgaard

38123

answered Jan 23 '14 at 19:22

David Kjerrumgaard

38123

4

Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

– jatal
Oct 27 '14 at 18:04

This is the cleanest solution

– Dutta
Sep 21 '15 at 17:08

worked best thanks

– Amrita Sawant
Feb 1 '16 at 18:24

1

This failed for me for e.g., a date time string that had a space between date and time.

– williaster
Dec 15 '16 at 21:43

@williaster sed 's/t+/,/g' this should help for this issue.

– Sudhakar Chavan
Jun 1 '17 at 11:44

|
show 1 more comment

4

Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

– jatal
Oct 27 '14 at 18:04

This is the cleanest solution

– Dutta
Sep 21 '15 at 17:08

worked best thanks

– Amrita Sawant
Feb 1 '16 at 18:24

1

This failed for me for e.g., a date time string that had a space between date and time.

– williaster
Dec 15 '16 at 21:43

@williaster sed 's/t+/,/g' this should help for this issue.

– Sudhakar Chavan
Jun 1 '17 at 11:44

Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

– jatal
Oct 27 '14 at 18:04

This is the cleanest solution

– Dutta
Sep 21 '15 at 17:08

worked best thanks

– Amrita Sawant
Feb 1 '16 at 18:24

This failed for me for e.g., a date time string that had a space between date and time.

– williaster
Dec 15 '16 at 21:43

@williaster sed 's/t+/,/g' this should help for this issue.

– Sudhakar Chavan
Jun 1 '17 at 11:44

|
show 1 more comment

You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.

answered Aug 8 '13 at 17:23

Olaf

5,77011437

This approach is best if one wants to use output in a subsequent oozie pipeline step.

– cerd
Apr 13 '14 at 21:30

add a comment |

You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.

answered Aug 8 '13 at 17:23

Olaf

5,77011437

This approach is best if one wants to use output in a subsequent oozie pipeline step.

– cerd
Apr 13 '14 at 21:30

add a comment |

You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.

answered Aug 8 '13 at 17:23

Olaf

5,77011437

You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.

answered Aug 8 '13 at 17:23

Olaf

5,77011437

answered Aug 8 '13 at 17:23

Olaf

5,77011437

answered Aug 8 '13 at 17:23

Olaf

5,77011437

answered Aug 8 '13 at 17:23

Olaf

5,77011437

This approach is best if one wants to use output in a subsequent oozie pipeline step.

– cerd
Apr 13 '14 at 21:30

add a comment |

This approach is best if one wants to use output in a subsequent oozie pipeline step.

– cerd
Apr 13 '14 at 21:30

This approach is best if one wants to use output in a subsequent oozie pipeline step.

– cerd
Apr 13 '14 at 21:30

add a comment |

answered Jul 29 '14 at 18:00

Ray

8128

add a comment |

answered Jul 29 '14 at 18:00

Ray

8128

add a comment |

answered Jul 29 '14 at 18:00

Ray

8128

answered Jul 29 '14 at 18:00

Ray

8128

answered Jul 29 '14 at 18:00

Ray

8128

answered Jul 29 '14 at 18:00

Ray

8128

answered Jul 29 '14 at 18:00

Ray

8128

add a comment |

I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.

To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:

hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'

This created a perfectly valid csv. Hope this helps those who come looking for this solution.

answered Aug 27 '15 at 0:49

sisanared

2,1111331

It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

– Josh
Jun 2 '16 at 14:44

How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

– Naresh S
Apr 23 '18 at 8:30

@NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

– sisanared
May 13 '18 at 18:10

@sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

– Naresh S
May 14 '18 at 8:42

@NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

– sisanared
May 18 '18 at 4:39

|
show 1 more comment

I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.

To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:

hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'

This created a perfectly valid csv. Hope this helps those who come looking for this solution.

answered Aug 27 '15 at 0:49

sisanared

2,1111331

It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

– Josh
Jun 2 '16 at 14:44

How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

– Naresh S
Apr 23 '18 at 8:30

@NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

– sisanared
May 13 '18 at 18:10

@sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

– Naresh S
May 14 '18 at 8:42

@NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

– sisanared
May 18 '18 at 4:39

|
show 1 more comment

I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.

To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:

hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'

This created a perfectly valid csv. Hope this helps those who come looking for this solution.

answered Aug 27 '15 at 0:49

sisanared

2,1111331

I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.

To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:

hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'

This created a perfectly valid csv. Hope this helps those who come looking for this solution.

answered Aug 27 '15 at 0:49

sisanared

2,1111331

answered Aug 27 '15 at 0:49

sisanared

2,1111331

answered Aug 27 '15 at 0:49

sisanared

2,1111331

answered Aug 27 '15 at 0:49

sisanared

2,1111331

It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

– Josh
Jun 2 '16 at 14:44

How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

– Naresh S
Apr 23 '18 at 8:30

@NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

– sisanared
May 13 '18 at 18:10

@sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

– Naresh S
May 14 '18 at 8:42

@NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

– sisanared
May 18 '18 at 4:39

|
show 1 more comment

It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

– Josh
Jun 2 '16 at 14:44

How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

– Naresh S
Apr 23 '18 at 8:30

@NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

– sisanared
May 13 '18 at 18:10

@sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

– Naresh S
May 14 '18 at 8:42

@NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

– sisanared
May 18 '18 at 4:39

It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

– Josh
Jun 2 '16 at 14:44

How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

– Naresh S
Apr 23 '18 at 8:30

@NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

– sisanared
May 13 '18 at 18:10

@sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

– Naresh S
May 14 '18 at 8:42

@NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

– sisanared
May 18 '18 at 4:39

|
show 1 more comment

You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )

for ex:

hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv

edited Jun 27 '16 at 7:26

answered Apr 8 '15 at 21:03

Ram Ghadiyaram

16.5k64477

add a comment |

You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )

for ex:

hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv

edited Jun 27 '16 at 7:26

answered Apr 8 '15 at 21:03

Ram Ghadiyaram

16.5k64477

add a comment |

You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )

for ex:

hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv

edited Jun 27 '16 at 7:26

answered Apr 8 '15 at 21:03

Ram Ghadiyaram

16.5k64477

You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )

for ex:

hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv

edited Jun 27 '16 at 7:26

answered Apr 8 '15 at 21:03

Ram Ghadiyaram

16.5k64477

edited Jun 27 '16 at 7:26

answered Apr 8 '15 at 21:03

Ram Ghadiyaram

16.5k64477

answered Apr 8 '15 at 21:03

Ram Ghadiyaram

16.5k64477

answered Apr 8 '15 at 21:03

Ram Ghadiyaram

16.5k64477

add a comment |

You can use INSERT … DIRECTORY …, as in this example:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'

SELECT name, salary, address

FROM employees

WHERE se.state = 'CA';

edited Jun 27 '16 at 7:42

jotik

8,65963792

answered Jun 29 '14 at 7:38

bigmakers

462

add a comment |

You can use INSERT … DIRECTORY …, as in this example:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'

SELECT name, salary, address

FROM employees

WHERE se.state = 'CA';

edited Jun 27 '16 at 7:42

jotik

8,65963792

answered Jun 29 '14 at 7:38

bigmakers

462

add a comment |

You can use INSERT … DIRECTORY …, as in this example:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'

SELECT name, salary, address

FROM employees

WHERE se.state = 'CA';

edited Jun 27 '16 at 7:42

jotik

8,65963792

answered Jun 29 '14 at 7:38

bigmakers

462

You can use INSERT … DIRECTORY …, as in this example:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'

SELECT name, salary, address

FROM employees

WHERE se.state = 'CA';

edited Jun 27 '16 at 7:42

jotik

8,65963792

answered Jun 29 '14 at 7:38

bigmakers

462

edited Jun 27 '16 at 7:42

jotik

8,65963792

edited Jun 27 '16 at 7:42

jotik

8,65963792

edited Jun 27 '16 at 7:42

jotik

8,65963792

answered Jun 29 '14 at 7:38

bigmakers

462

answered Jun 29 '14 at 7:38

bigmakers

462

answered Jun 29 '14 at 7:38

bigmakers

462

add a comment |

I had a similar issue and this is how I was able to address it.

Step 1 - Loaded the data from Hive table into another table as follows

DROP TABLE IF EXISTS TestHiveTableCSV;

CREATE TABLE TestHiveTableCSV 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ','

LINES TERMINATED BY 'n' AS

SELECT Column List FROM TestHiveTable;

Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension

Start-AzureStorageBlobCopy

-DestContext $destContext 

-SrcContainer "Source Container"

-SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"

-DestContainer "Destination Container"

-DestBlob "CSV/TestHiveTable.csv"

edited Jun 30 '17 at 12:48

CalvT

2,17942739

answered May 29 '14 at 14:24

Dattatrey Sindol

637

add a comment |

I had a similar issue and this is how I was able to address it.

Step 1 - Loaded the data from Hive table into another table as follows

DROP TABLE IF EXISTS TestHiveTableCSV;

CREATE TABLE TestHiveTableCSV 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ','

LINES TERMINATED BY 'n' AS

SELECT Column List FROM TestHiveTable;

Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension

Start-AzureStorageBlobCopy

-DestContext $destContext 

-SrcContainer "Source Container"

-SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"

-DestContainer "Destination Container"

-DestBlob "CSV/TestHiveTable.csv"

edited Jun 30 '17 at 12:48

CalvT

2,17942739

answered May 29 '14 at 14:24

Dattatrey Sindol

637

add a comment |

I had a similar issue and this is how I was able to address it.

Step 1 - Loaded the data from Hive table into another table as follows

DROP TABLE IF EXISTS TestHiveTableCSV;

CREATE TABLE TestHiveTableCSV 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ','

LINES TERMINATED BY 'n' AS

SELECT Column List FROM TestHiveTable;

Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension

Start-AzureStorageBlobCopy

-DestContext $destContext 

-SrcContainer "Source Container"

-SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"

-DestContainer "Destination Container"

-DestBlob "CSV/TestHiveTable.csv"

edited Jun 30 '17 at 12:48

CalvT

2,17942739

answered May 29 '14 at 14:24

Dattatrey Sindol

637

I had a similar issue and this is how I was able to address it.

Step 1 - Loaded the data from Hive table into another table as follows

DROP TABLE IF EXISTS TestHiveTableCSV;

CREATE TABLE TestHiveTableCSV 

ROW FORMAT DELIMITED 

FIELDS TERMINATED BY ','

LINES TERMINATED BY 'n' AS

SELECT Column List FROM TestHiveTable;

Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension

Start-AzureStorageBlobCopy

-DestContext $destContext 

-SrcContainer "Source Container"

-SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"

-DestContainer "Destination Container"

-DestBlob "CSV/TestHiveTable.csv"

edited Jun 30 '17 at 12:48

CalvT

2,17942739

answered May 29 '14 at 14:24

Dattatrey Sindol

637

edited Jun 30 '17 at 12:48

CalvT

2,17942739

edited Jun 30 '17 at 12:48

CalvT

2,17942739

edited Jun 30 '17 at 12:48

CalvT

2,17942739

answered May 29 '14 at 14:24

Dattatrey Sindol

637

answered May 29 '14 at 14:24

Dattatrey Sindol

637

answered May 29 '14 at 14:24

Dattatrey Sindol

637

add a comment |

The default separator is "^A". In python language, it is "x01".

When I want to change the delimiter, I use SQL like:

SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table

Then, regard delimiter+"^A" as a new delimiter.

edited Jun 27 '16 at 7:59

Ram Ghadiyaram

16.5k64477

answered Nov 6 '13 at 22:27

moshaholo

1519

add a comment |

The default separator is "^A". In python language, it is "x01".

When I want to change the delimiter, I use SQL like:

SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table

Then, regard delimiter+"^A" as a new delimiter.

edited Jun 27 '16 at 7:59

Ram Ghadiyaram

16.5k64477

answered Nov 6 '13 at 22:27

moshaholo

1519

add a comment |

The default separator is "^A". In python language, it is "x01".

When I want to change the delimiter, I use SQL like:

SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table

Then, regard delimiter+"^A" as a new delimiter.

edited Jun 27 '16 at 7:59

Ram Ghadiyaram

16.5k64477

answered Nov 6 '13 at 22:27

moshaholo

1519

The default separator is "^A". In python language, it is "x01".

When I want to change the delimiter, I use SQL like:

SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table

Then, regard delimiter+"^A" as a new delimiter.

edited Jun 27 '16 at 7:59

Ram Ghadiyaram

16.5k64477

answered Nov 6 '13 at 22:27

moshaholo

1519

edited Jun 27 '16 at 7:59

Ram Ghadiyaram

16.5k64477

edited Jun 27 '16 at 7:59

Ram Ghadiyaram

16.5k64477

edited Jun 27 '16 at 7:59

Ram Ghadiyaram

16.5k64477

answered Nov 6 '13 at 22:27

moshaholo

1519

answered Nov 6 '13 at 22:27

moshaholo

1519

answered Nov 6 '13 at 22:27

moshaholo

1519

add a comment |

Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.

answered Jan 3 '18 at 13:35

schoon

78121233

add a comment |

Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.

answered Jan 3 '18 at 13:35

schoon

78121233

add a comment |

Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.

answered Jan 3 '18 at 13:35

schoon

78121233

Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.

answered Jan 3 '18 at 13:35

schoon

78121233

answered Jan 3 '18 at 13:35

schoon

78121233

answered Jan 3 '18 at 13:35

schoon

78121233

answered Jan 3 '18 at 13:35

schoon

78121233

add a comment |

In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.

It will:

pbrun.

kinit.

beeline (with your query).

Save echo from beeline to a file on Windows.

Execute it like this:

set PROXY_HOST=your_bastion_host



set SERVICE_USER=you_func_user



set LINUX_USER=your_SOID



set LINUX_PWD=your_pwd



python hh.py --query_file=query.sql

answered Nov 21 '18 at 14:57

Alex B

718719

add a comment |

In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.

It will:

pbrun.

kinit.

beeline (with your query).

Save echo from beeline to a file on Windows.

Execute it like this:

set PROXY_HOST=your_bastion_host



set SERVICE_USER=you_func_user



set LINUX_USER=your_SOID



set LINUX_PWD=your_pwd



python hh.py --query_file=query.sql

answered Nov 21 '18 at 14:57

Alex B

718719

add a comment |

In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.

It will:

pbrun.

kinit.

beeline (with your query).

Save echo from beeline to a file on Windows.

Execute it like this:

set PROXY_HOST=your_bastion_host



set SERVICE_USER=you_func_user



set LINUX_USER=your_SOID



set LINUX_PWD=your_pwd



python hh.py --query_file=query.sql

answered Nov 21 '18 at 14:57

Alex B

718719

In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.

It will:

pbrun.

kinit.

beeline (with your query).

Save echo from beeline to a file on Windows.

Execute it like this:

set PROXY_HOST=your_bastion_host



set SERVICE_USER=you_func_user



set LINUX_USER=your_SOID



set LINUX_PWD=your_pwd



python hh.py --query_file=query.sql

answered Nov 21 '18 at 14:57

Alex B

718719

answered Nov 21 '18 at 14:57

Alex B

718719

answered Nov 21 '18 at 14:57

Alex B

718719

answered Nov 21 '18 at 14:57

Alex B

718719

add a comment |

I tried various options, but this would be one of the simplest solution for Python Pandas:

hive -e 'select books from table' | grep "|" ' > temp.csv



df=pd.read_csv("temp.csv",sep='|')

You can also use tr "|" "," to convert "|" to ","

answered Dec 11 '18 at 21:34

notilas

4811724

add a comment |

I tried various options, but this would be one of the simplest solution for Python Pandas:

hive -e 'select books from table' | grep "|" ' > temp.csv



df=pd.read_csv("temp.csv",sep='|')

You can also use tr "|" "," to convert "|" to ","

answered Dec 11 '18 at 21:34

notilas

4811724

add a comment |

I tried various options, but this would be one of the simplest solution for Python Pandas:

hive -e 'select books from table' | grep "|" ' > temp.csv



df=pd.read_csv("temp.csv",sep='|')

You can also use tr "|" "," to convert "|" to ","

answered Dec 11 '18 at 21:34

notilas

4811724

I tried various options, but this would be one of the simplest solution for Python Pandas:

hive -e 'select books from table' | grep "|" ' > temp.csv



df=pd.read_csv("temp.csv",sep='|')

You can also use tr "|" "," to convert "|" to ","

answered Dec 11 '18 at 21:34

notilas

4811724

answered Dec 11 '18 at 21:34

notilas

4811724

answered Dec 11 '18 at 21:34

notilas

4811724

answered Dec 11 '18 at 21:34

notilas

4811724

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

a,BmJRYWzRKJ

搜尋此網誌

Argthtjtr