How do I output the results of a HiveQL query to CSV?












71















we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:



insert overwrite directory '/home/output.csv' select books from table;


When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?



Thanks!










share|improve this question





























    71















    we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:



    insert overwrite directory '/home/output.csv' select books from table;


    When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?



    Thanks!










    share|improve this question



























      71












      71








      71


      37






      we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:



      insert overwrite directory '/home/output.csv' select books from table;


      When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?



      Thanks!










      share|improve this question
















      we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:



      insert overwrite directory '/home/output.csv' select books from table;


      When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?



      Thanks!







      database hadoop hive bigdata hiveql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Oct 28 '14 at 16:17









      Vldb.User

      194




      194










      asked Aug 8 '13 at 15:07









      AAAAAA

      95451736




      95451736
























          12 Answers
          12






          active

          oldest

          votes


















          138














          Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.



          According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.




          Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.




          A slight modification (adding the LOCAL keyword) will store the data in a local directory.



          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;


          When I run a similar query, here's what the output looks like.



          [lvermeer@hadoop temp]$ ll
          total 4
          -rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
          [lvermeer@hadoop temp]$ head 000000_0
          "row1""col1"1234"col3"1234FALSE
          "row2""col1"5678"col3"5678TRUE


          Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:



          hive -e 'select books from table' > /home/lvermeer/temp.tsv


          That gives me a tab-separated file that I can use. Hope that is useful for you as well.



          Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.



          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
          ROW FORMAT DELIMITED
          FIELDS TERMINATED BY ','
          select books from table;


          Hope that helps.






          share|improve this answer



















          • 3





            Thank you!! The hive -e approach worked perfectly!

            – AAA
            Aug 9 '13 at 17:46






          • 2





            do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

            – fd8s0
            Nov 5 '14 at 14:56











          • Is it possible to export the data in HDFS as Sequence file format?

            – Nageswaran
            Jul 27 '15 at 6:59






          • 1





            I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

            – Peter Cogan
            Dec 3 '15 at 19:22











          • @lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

            – ML_Passion
            Feb 23 '17 at 18:14



















          21














          If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):



          hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv





          share|improve this answer





















          • 4





            Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

            – jatal
            Oct 27 '14 at 18:04













          • This is the cleanest solution

            – Dutta
            Sep 21 '15 at 17:08











          • worked best thanks

            – Amrita Sawant
            Feb 1 '16 at 18:24






          • 1





            This failed for me for e.g., a date time string that had a space between date and time.

            – williaster
            Dec 15 '16 at 21:43











          • @williaster sed 's/t+/,/g' this should help for this issue.

            – Sudhakar Chavan
            Jun 1 '17 at 11:44





















          4














          You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.



          You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.






          share|improve this answer
























          • This approach is best if one wants to use output in a subsequent oozie pipeline step.

            – cerd
            Apr 13 '14 at 21:30



















          3














          If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.






          share|improve this answer































            3














            I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.



            To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:



            hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'


            This created a perfectly valid csv. Hope this helps those who come looking for this solution.






            share|improve this answer
























            • It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

              – Josh
              Jun 2 '16 at 14:44











            • How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

              – Naresh S
              Apr 23 '18 at 8:30











            • @NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

              – sisanared
              May 13 '18 at 18:10











            • @sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

              – Naresh S
              May 14 '18 at 8:42











            • @NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

              – sisanared
              May 18 '18 at 4:39



















            3














            You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )



            for ex:



            hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv





            share|improve this answer

































              2














              You can use INSERTDIRECTORY …, as in this example:



              INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
              SELECT name, salary, address
              FROM employees
              WHERE se.state = 'CA';


              OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.






              share|improve this answer

































                2














                I had a similar issue and this is how I was able to address it.



                Step 1 - Loaded the data from Hive table into another table as follows



                DROP TABLE IF EXISTS TestHiveTableCSV;
                CREATE TABLE TestHiveTableCSV
                ROW FORMAT DELIMITED
                FIELDS TERMINATED BY ','
                LINES TERMINATED BY 'n' AS
                SELECT Column List FROM TestHiveTable;


                Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension



                Start-AzureStorageBlobCopy
                -DestContext $destContext
                -SrcContainer "Source Container"
                -SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
                -DestContainer "Destination Container"
                -DestBlob "CSV/TestHiveTable.csv"





                share|improve this answer

































                  1














                  The default separator is "^A". In python language, it is "x01".



                  When I want to change the delimiter, I use SQL like:



                  SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table


                  Then, regard delimiter+"^A" as a new delimiter.






                  share|improve this answer

































                    0














                    Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.






                    share|improve this answer































                      0














                      In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.



                      It will:




                      1. Login to bastion host.

                      2. pbrun.

                      3. kinit.

                      4. beeline (with your query).

                      5. Save echo from beeline to a file on Windows.


                      Execute it like this:



                      set PROXY_HOST=your_bastion_host

                      set SERVICE_USER=you_func_user

                      set LINUX_USER=your_SOID

                      set LINUX_PWD=your_pwd

                      python hh.py --query_file=query.sql





                      share|improve this answer































                        0














                        I tried various options, but this would be one of the simplest solution for Python Pandas:



                        hive -e 'select books from table' | grep "|" ' > temp.csv

                        df=pd.read_csv("temp.csv",sep='|')


                        You can also use tr "|" "," to convert "|" to ","






                        share|improve this answer























                          Your Answer






                          StackExchange.ifUsing("editor", function () {
                          StackExchange.using("externalEditor", function () {
                          StackExchange.using("snippets", function () {
                          StackExchange.snippets.init();
                          });
                          });
                          }, "code-snippets");

                          StackExchange.ready(function() {
                          var channelOptions = {
                          tags: "".split(" "),
                          id: "1"
                          };
                          initTagRenderer("".split(" "), "".split(" "), channelOptions);

                          StackExchange.using("externalEditor", function() {
                          // Have to fire editor after snippets, if snippets enabled
                          if (StackExchange.settings.snippets.snippetsEnabled) {
                          StackExchange.using("snippets", function() {
                          createEditor();
                          });
                          }
                          else {
                          createEditor();
                          }
                          });

                          function createEditor() {
                          StackExchange.prepareEditor({
                          heartbeatType: 'answer',
                          autoActivateHeartbeat: false,
                          convertImagesToLinks: true,
                          noModals: true,
                          showLowRepImageUploadWarning: true,
                          reputationToPostImages: 10,
                          bindNavPrevention: true,
                          postfix: "",
                          imageUploader: {
                          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                          allowUrls: true
                          },
                          onDemand: true,
                          discardSelector: ".discard-answer"
                          ,immediatelyShowMarkdownHelp:true
                          });


                          }
                          });














                          draft saved

                          draft discarded


















                          StackExchange.ready(
                          function () {
                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18129581%2fhow-do-i-output-the-results-of-a-hiveql-query-to-csv%23new-answer', 'question_page');
                          }
                          );

                          Post as a guest















                          Required, but never shown

























                          12 Answers
                          12






                          active

                          oldest

                          votes








                          12 Answers
                          12






                          active

                          oldest

                          votes









                          active

                          oldest

                          votes






                          active

                          oldest

                          votes









                          138














                          Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.



                          According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.




                          Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.




                          A slight modification (adding the LOCAL keyword) will store the data in a local directory.



                          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;


                          When I run a similar query, here's what the output looks like.



                          [lvermeer@hadoop temp]$ ll
                          total 4
                          -rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
                          [lvermeer@hadoop temp]$ head 000000_0
                          "row1""col1"1234"col3"1234FALSE
                          "row2""col1"5678"col3"5678TRUE


                          Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:



                          hive -e 'select books from table' > /home/lvermeer/temp.tsv


                          That gives me a tab-separated file that I can use. Hope that is useful for you as well.



                          Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.



                          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
                          ROW FORMAT DELIMITED
                          FIELDS TERMINATED BY ','
                          select books from table;


                          Hope that helps.






                          share|improve this answer



















                          • 3





                            Thank you!! The hive -e approach worked perfectly!

                            – AAA
                            Aug 9 '13 at 17:46






                          • 2





                            do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

                            – fd8s0
                            Nov 5 '14 at 14:56











                          • Is it possible to export the data in HDFS as Sequence file format?

                            – Nageswaran
                            Jul 27 '15 at 6:59






                          • 1





                            I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

                            – Peter Cogan
                            Dec 3 '15 at 19:22











                          • @lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

                            – ML_Passion
                            Feb 23 '17 at 18:14
















                          138














                          Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.



                          According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.




                          Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.




                          A slight modification (adding the LOCAL keyword) will store the data in a local directory.



                          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;


                          When I run a similar query, here's what the output looks like.



                          [lvermeer@hadoop temp]$ ll
                          total 4
                          -rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
                          [lvermeer@hadoop temp]$ head 000000_0
                          "row1""col1"1234"col3"1234FALSE
                          "row2""col1"5678"col3"5678TRUE


                          Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:



                          hive -e 'select books from table' > /home/lvermeer/temp.tsv


                          That gives me a tab-separated file that I can use. Hope that is useful for you as well.



                          Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.



                          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
                          ROW FORMAT DELIMITED
                          FIELDS TERMINATED BY ','
                          select books from table;


                          Hope that helps.






                          share|improve this answer



















                          • 3





                            Thank you!! The hive -e approach worked perfectly!

                            – AAA
                            Aug 9 '13 at 17:46






                          • 2





                            do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

                            – fd8s0
                            Nov 5 '14 at 14:56











                          • Is it possible to export the data in HDFS as Sequence file format?

                            – Nageswaran
                            Jul 27 '15 at 6:59






                          • 1





                            I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

                            – Peter Cogan
                            Dec 3 '15 at 19:22











                          • @lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

                            – ML_Passion
                            Feb 23 '17 at 18:14














                          138












                          138








                          138







                          Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.



                          According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.




                          Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.




                          A slight modification (adding the LOCAL keyword) will store the data in a local directory.



                          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;


                          When I run a similar query, here's what the output looks like.



                          [lvermeer@hadoop temp]$ ll
                          total 4
                          -rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
                          [lvermeer@hadoop temp]$ head 000000_0
                          "row1""col1"1234"col3"1234FALSE
                          "row2""col1"5678"col3"5678TRUE


                          Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:



                          hive -e 'select books from table' > /home/lvermeer/temp.tsv


                          That gives me a tab-separated file that I can use. Hope that is useful for you as well.



                          Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.



                          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
                          ROW FORMAT DELIMITED
                          FIELDS TERMINATED BY ','
                          select books from table;


                          Hope that helps.






                          share|improve this answer













                          Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.



                          According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.




                          Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.




                          A slight modification (adding the LOCAL keyword) will store the data in a local directory.



                          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;


                          When I run a similar query, here's what the output looks like.



                          [lvermeer@hadoop temp]$ ll
                          total 4
                          -rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
                          [lvermeer@hadoop temp]$ head 000000_0
                          "row1""col1"1234"col3"1234FALSE
                          "row2""col1"5678"col3"5678TRUE


                          Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:



                          hive -e 'select books from table' > /home/lvermeer/temp.tsv


                          That gives me a tab-separated file that I can use. Hope that is useful for you as well.



                          Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.



                          INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
                          ROW FORMAT DELIMITED
                          FIELDS TERMINATED BY ','
                          select books from table;


                          Hope that helps.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Aug 9 '13 at 7:44









                          Lukas VermeerLukas Vermeer

                          4,68221119




                          4,68221119








                          • 3





                            Thank you!! The hive -e approach worked perfectly!

                            – AAA
                            Aug 9 '13 at 17:46






                          • 2





                            do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

                            – fd8s0
                            Nov 5 '14 at 14:56











                          • Is it possible to export the data in HDFS as Sequence file format?

                            – Nageswaran
                            Jul 27 '15 at 6:59






                          • 1





                            I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

                            – Peter Cogan
                            Dec 3 '15 at 19:22











                          • @lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

                            – ML_Passion
                            Feb 23 '17 at 18:14














                          • 3





                            Thank you!! The hive -e approach worked perfectly!

                            – AAA
                            Aug 9 '13 at 17:46






                          • 2





                            do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

                            – fd8s0
                            Nov 5 '14 at 14:56











                          • Is it possible to export the data in HDFS as Sequence file format?

                            – Nageswaran
                            Jul 27 '15 at 6:59






                          • 1





                            I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

                            – Peter Cogan
                            Dec 3 '15 at 19:22











                          • @lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

                            – ML_Passion
                            Feb 23 '17 at 18:14








                          3




                          3





                          Thank you!! The hive -e approach worked perfectly!

                          – AAA
                          Aug 9 '13 at 17:46





                          Thank you!! The hive -e approach worked perfectly!

                          – AAA
                          Aug 9 '13 at 17:46




                          2




                          2





                          do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

                          – fd8s0
                          Nov 5 '14 at 14:56





                          do you know any performance difference between insert overwrite local and piping, at which approximated volume it can become an issue, also, piping guarantees you'll get one file, as the other approach gives us a directory which potentially we need to merge afterwards

                          – fd8s0
                          Nov 5 '14 at 14:56













                          Is it possible to export the data in HDFS as Sequence file format?

                          – Nageswaran
                          Jul 27 '15 at 6:59





                          Is it possible to export the data in HDFS as Sequence file format?

                          – Nageswaran
                          Jul 27 '15 at 6:59




                          1




                          1





                          I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

                          – Peter Cogan
                          Dec 3 '15 at 19:22





                          I tried the solution (patch-3682) and it worked well for me - except that for some reason the output file did not include the headers. Note that I have set hive.cli.print.header=true; in my .hiverc. For what it's worth the headers got printed to the terminal instead (which is obviously not what I wanted).

                          – Peter Cogan
                          Dec 3 '15 at 19:22













                          @lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

                          – ML_Passion
                          Feb 23 '17 at 18:14





                          @lukas-vermeer, when you create the table using the "INSERT OVERWRITE" method , the header information is lost . Is there a way to get the header information ?

                          – ML_Passion
                          Feb 23 '17 at 18:14













                          21














                          If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):



                          hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv





                          share|improve this answer





















                          • 4





                            Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

                            – jatal
                            Oct 27 '14 at 18:04













                          • This is the cleanest solution

                            – Dutta
                            Sep 21 '15 at 17:08











                          • worked best thanks

                            – Amrita Sawant
                            Feb 1 '16 at 18:24






                          • 1





                            This failed for me for e.g., a date time string that had a space between date and time.

                            – williaster
                            Dec 15 '16 at 21:43











                          • @williaster sed 's/t+/,/g' this should help for this issue.

                            – Sudhakar Chavan
                            Jun 1 '17 at 11:44


















                          21














                          If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):



                          hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv





                          share|improve this answer





















                          • 4





                            Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

                            – jatal
                            Oct 27 '14 at 18:04













                          • This is the cleanest solution

                            – Dutta
                            Sep 21 '15 at 17:08











                          • worked best thanks

                            – Amrita Sawant
                            Feb 1 '16 at 18:24






                          • 1





                            This failed for me for e.g., a date time string that had a space between date and time.

                            – williaster
                            Dec 15 '16 at 21:43











                          • @williaster sed 's/t+/,/g' this should help for this issue.

                            – Sudhakar Chavan
                            Jun 1 '17 at 11:44
















                          21












                          21








                          21







                          If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):



                          hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv





                          share|improve this answer















                          If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):



                          hive -e 'select books from table' | sed 's/[[:space:]]+/,/g' > /home/lvermeer/temp.csv






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Jan 23 '14 at 19:43









                          dee-see

                          19.1k44579




                          19.1k44579










                          answered Jan 23 '14 at 19:22









                          David KjerrumgaardDavid Kjerrumgaard

                          38123




                          38123








                          • 4





                            Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

                            – jatal
                            Oct 27 '14 at 18:04













                          • This is the cleanest solution

                            – Dutta
                            Sep 21 '15 at 17:08











                          • worked best thanks

                            – Amrita Sawant
                            Feb 1 '16 at 18:24






                          • 1





                            This failed for me for e.g., a date time string that had a space between date and time.

                            – williaster
                            Dec 15 '16 at 21:43











                          • @williaster sed 's/t+/,/g' this should help for this issue.

                            – Sudhakar Chavan
                            Jun 1 '17 at 11:44
















                          • 4





                            Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

                            – jatal
                            Oct 27 '14 at 18:04













                          • This is the cleanest solution

                            – Dutta
                            Sep 21 '15 at 17:08











                          • worked best thanks

                            – Amrita Sawant
                            Feb 1 '16 at 18:24






                          • 1





                            This failed for me for e.g., a date time string that had a space between date and time.

                            – williaster
                            Dec 15 '16 at 21:43











                          • @williaster sed 's/t+/,/g' this should help for this issue.

                            – Sudhakar Chavan
                            Jun 1 '17 at 11:44










                          4




                          4





                          Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

                          – jatal
                          Oct 27 '14 at 18:04







                          Thanks for this. I am using a variation, but it works very well. Please note that this will output comma-delimited, not necessarily what some folks think of as CSV. CSV typically has some formatting to handle data with commas (e.g. wrap data with double-quotes, and double-double-quote for data with double-quotes). Worth mentioning that adding the "--hiveconf hive.cli.print.header=True" parameter will get your headers in the output as well.

                          – jatal
                          Oct 27 '14 at 18:04















                          This is the cleanest solution

                          – Dutta
                          Sep 21 '15 at 17:08





                          This is the cleanest solution

                          – Dutta
                          Sep 21 '15 at 17:08













                          worked best thanks

                          – Amrita Sawant
                          Feb 1 '16 at 18:24





                          worked best thanks

                          – Amrita Sawant
                          Feb 1 '16 at 18:24




                          1




                          1





                          This failed for me for e.g., a date time string that had a space between date and time.

                          – williaster
                          Dec 15 '16 at 21:43





                          This failed for me for e.g., a date time string that had a space between date and time.

                          – williaster
                          Dec 15 '16 at 21:43













                          @williaster sed 's/t+/,/g' this should help for this issue.

                          – Sudhakar Chavan
                          Jun 1 '17 at 11:44







                          @williaster sed 's/t+/,/g' this should help for this issue.

                          – Sudhakar Chavan
                          Jun 1 '17 at 11:44













                          4














                          You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.



                          You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.






                          share|improve this answer
























                          • This approach is best if one wants to use output in a subsequent oozie pipeline step.

                            – cerd
                            Apr 13 '14 at 21:30
















                          4














                          You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.



                          You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.






                          share|improve this answer
























                          • This approach is best if one wants to use output in a subsequent oozie pipeline step.

                            – cerd
                            Apr 13 '14 at 21:30














                          4












                          4








                          4







                          You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.



                          You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.






                          share|improve this answer













                          You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.



                          You also might have to do some trickery to convert the files from '01' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Aug 8 '13 at 17:23









                          OlafOlaf

                          5,77011437




                          5,77011437













                          • This approach is best if one wants to use output in a subsequent oozie pipeline step.

                            – cerd
                            Apr 13 '14 at 21:30



















                          • This approach is best if one wants to use output in a subsequent oozie pipeline step.

                            – cerd
                            Apr 13 '14 at 21:30

















                          This approach is best if one wants to use output in a subsequent oozie pipeline step.

                          – cerd
                          Apr 13 '14 at 21:30





                          This approach is best if one wants to use output in a subsequent oozie pipeline step.

                          – cerd
                          Apr 13 '14 at 21:30











                          3














                          If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.






                          share|improve this answer




























                            3














                            If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.






                            share|improve this answer


























                              3












                              3








                              3







                              If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.






                              share|improve this answer













                              If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Jul 29 '14 at 18:00









                              RayRay

                              8128




                              8128























                                  3














                                  I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.



                                  To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:



                                  hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'


                                  This created a perfectly valid csv. Hope this helps those who come looking for this solution.






                                  share|improve this answer
























                                  • It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

                                    – Josh
                                    Jun 2 '16 at 14:44











                                  • How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

                                    – Naresh S
                                    Apr 23 '18 at 8:30











                                  • @NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

                                    – sisanared
                                    May 13 '18 at 18:10











                                  • @sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

                                    – Naresh S
                                    May 14 '18 at 8:42











                                  • @NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

                                    – sisanared
                                    May 18 '18 at 4:39
















                                  3














                                  I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.



                                  To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:



                                  hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'


                                  This created a perfectly valid csv. Hope this helps those who come looking for this solution.






                                  share|improve this answer
























                                  • It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

                                    – Josh
                                    Jun 2 '16 at 14:44











                                  • How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

                                    – Naresh S
                                    Apr 23 '18 at 8:30











                                  • @NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

                                    – sisanared
                                    May 13 '18 at 18:10











                                  • @sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

                                    – Naresh S
                                    May 14 '18 at 8:42











                                  • @NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

                                    – sisanared
                                    May 18 '18 at 4:39














                                  3












                                  3








                                  3







                                  I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.



                                  To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:



                                  hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'


                                  This created a perfectly valid csv. Hope this helps those who come looking for this solution.






                                  share|improve this answer













                                  I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.



                                  To make the column data tsv safe, I replaced all t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:



                                  hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)nfor row in reader: writer.writerow(row)")'


                                  This created a perfectly valid csv. Hope this helps those who come looking for this solution.







                                  share|improve this answer












                                  share|improve this answer



                                  share|improve this answer










                                  answered Aug 27 '15 at 0:49









                                  sisanaredsisanared

                                  2,1111331




                                  2,1111331













                                  • It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

                                    – Josh
                                    Jun 2 '16 at 14:44











                                  • How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

                                    – Naresh S
                                    Apr 23 '18 at 8:30











                                  • @NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

                                    – sisanared
                                    May 13 '18 at 18:10











                                  • @sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

                                    – Naresh S
                                    May 14 '18 at 8:42











                                  • @NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

                                    – sisanared
                                    May 18 '18 at 4:39



















                                  • It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

                                    – Josh
                                    Jun 2 '16 at 14:44











                                  • How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

                                    – Naresh S
                                    Apr 23 '18 at 8:30











                                  • @NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

                                    – sisanared
                                    May 13 '18 at 18:10











                                  • @sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

                                    – Naresh S
                                    May 14 '18 at 8:42











                                  • @NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

                                    – sisanared
                                    May 18 '18 at 4:39

















                                  It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

                                  – Josh
                                  Jun 2 '16 at 14:44





                                  It's 2016 and we still have to jump through hoops to do this? I found shravster's solution to be the best, most elegant solution so far.

                                  – Josh
                                  Jun 2 '16 at 14:44













                                  How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

                                  – Naresh S
                                  Apr 23 '18 at 8:30





                                  How did you replace all t chars in the column data ? did you address it in the query or created a separate view for it?

                                  – Naresh S
                                  Apr 23 '18 at 8:30













                                  @NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

                                  – sisanared
                                  May 13 '18 at 18:10





                                  @NareshS, sorry for the late response. Yes, the columns were handled in hive to replace tabs with spaces or if they are essential, you could replace with a substitute like <:tab>, or something along those lines

                                  – sisanared
                                  May 13 '18 at 18:10













                                  @sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

                                  – Naresh S
                                  May 14 '18 at 8:42





                                  @sisanared, Thanks for the response. I see we need to use regex replace for all string columns and this would be cumbersome if we have a table with large number of colums > 100. Is there a quick solution for such case

                                  – Naresh S
                                  May 14 '18 at 8:42













                                  @NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

                                  – sisanared
                                  May 18 '18 at 4:39





                                  @NareshS, unfortunately the only other solution is to clean up data before putting it in your partitions. Otherwise you will have to do it while performing the select for all the string columns that could contain tab chars

                                  – sisanared
                                  May 18 '18 at 4:39











                                  3














                                  You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )



                                  for ex:



                                  hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv





                                  share|improve this answer






























                                    3














                                    You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )



                                    for ex:



                                    hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv





                                    share|improve this answer




























                                      3












                                      3








                                      3







                                      You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )



                                      for ex:



                                      hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv





                                      share|improve this answer















                                      You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )



                                      for ex:



                                      hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv






                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited Jun 27 '16 at 7:26

























                                      answered Apr 8 '15 at 21:03









                                      Ram GhadiyaramRam Ghadiyaram

                                      16.5k64477




                                      16.5k64477























                                          2














                                          You can use INSERTDIRECTORY …, as in this example:



                                          INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
                                          SELECT name, salary, address
                                          FROM employees
                                          WHERE se.state = 'CA';


                                          OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.






                                          share|improve this answer






























                                            2














                                            You can use INSERTDIRECTORY …, as in this example:



                                            INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
                                            SELECT name, salary, address
                                            FROM employees
                                            WHERE se.state = 'CA';


                                            OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.






                                            share|improve this answer




























                                              2












                                              2








                                              2







                                              You can use INSERTDIRECTORY …, as in this example:



                                              INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
                                              SELECT name, salary, address
                                              FROM employees
                                              WHERE se.state = 'CA';


                                              OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.






                                              share|improve this answer















                                              You can use INSERTDIRECTORY …, as in this example:



                                              INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
                                              SELECT name, salary, address
                                              FROM employees
                                              WHERE se.state = 'CA';


                                              OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.







                                              share|improve this answer














                                              share|improve this answer



                                              share|improve this answer








                                              edited Jun 27 '16 at 7:42









                                              jotik

                                              8,65963792




                                              8,65963792










                                              answered Jun 29 '14 at 7:38









                                              bigmakersbigmakers

                                              462




                                              462























                                                  2














                                                  I had a similar issue and this is how I was able to address it.



                                                  Step 1 - Loaded the data from Hive table into another table as follows



                                                  DROP TABLE IF EXISTS TestHiveTableCSV;
                                                  CREATE TABLE TestHiveTableCSV
                                                  ROW FORMAT DELIMITED
                                                  FIELDS TERMINATED BY ','
                                                  LINES TERMINATED BY 'n' AS
                                                  SELECT Column List FROM TestHiveTable;


                                                  Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension



                                                  Start-AzureStorageBlobCopy
                                                  -DestContext $destContext
                                                  -SrcContainer "Source Container"
                                                  -SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
                                                  -DestContainer "Destination Container"
                                                  -DestBlob "CSV/TestHiveTable.csv"





                                                  share|improve this answer






























                                                    2














                                                    I had a similar issue and this is how I was able to address it.



                                                    Step 1 - Loaded the data from Hive table into another table as follows



                                                    DROP TABLE IF EXISTS TestHiveTableCSV;
                                                    CREATE TABLE TestHiveTableCSV
                                                    ROW FORMAT DELIMITED
                                                    FIELDS TERMINATED BY ','
                                                    LINES TERMINATED BY 'n' AS
                                                    SELECT Column List FROM TestHiveTable;


                                                    Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension



                                                    Start-AzureStorageBlobCopy
                                                    -DestContext $destContext
                                                    -SrcContainer "Source Container"
                                                    -SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
                                                    -DestContainer "Destination Container"
                                                    -DestBlob "CSV/TestHiveTable.csv"





                                                    share|improve this answer




























                                                      2












                                                      2








                                                      2







                                                      I had a similar issue and this is how I was able to address it.



                                                      Step 1 - Loaded the data from Hive table into another table as follows



                                                      DROP TABLE IF EXISTS TestHiveTableCSV;
                                                      CREATE TABLE TestHiveTableCSV
                                                      ROW FORMAT DELIMITED
                                                      FIELDS TERMINATED BY ','
                                                      LINES TERMINATED BY 'n' AS
                                                      SELECT Column List FROM TestHiveTable;


                                                      Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension



                                                      Start-AzureStorageBlobCopy
                                                      -DestContext $destContext
                                                      -SrcContainer "Source Container"
                                                      -SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
                                                      -DestContainer "Destination Container"
                                                      -DestBlob "CSV/TestHiveTable.csv"





                                                      share|improve this answer















                                                      I had a similar issue and this is how I was able to address it.



                                                      Step 1 - Loaded the data from Hive table into another table as follows



                                                      DROP TABLE IF EXISTS TestHiveTableCSV;
                                                      CREATE TABLE TestHiveTableCSV
                                                      ROW FORMAT DELIMITED
                                                      FIELDS TERMINATED BY ','
                                                      LINES TERMINATED BY 'n' AS
                                                      SELECT Column List FROM TestHiveTable;


                                                      Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension



                                                      Start-AzureStorageBlobCopy
                                                      -DestContext $destContext
                                                      -SrcContainer "Source Container"
                                                      -SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
                                                      -DestContainer "Destination Container"
                                                      -DestBlob "CSV/TestHiveTable.csv"






                                                      share|improve this answer














                                                      share|improve this answer



                                                      share|improve this answer








                                                      edited Jun 30 '17 at 12:48









                                                      CalvT

                                                      2,17942739




                                                      2,17942739










                                                      answered May 29 '14 at 14:24









                                                      Dattatrey SindolDattatrey Sindol

                                                      637




                                                      637























                                                          1














                                                          The default separator is "^A". In python language, it is "x01".



                                                          When I want to change the delimiter, I use SQL like:



                                                          SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table


                                                          Then, regard delimiter+"^A" as a new delimiter.






                                                          share|improve this answer






























                                                            1














                                                            The default separator is "^A". In python language, it is "x01".



                                                            When I want to change the delimiter, I use SQL like:



                                                            SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table


                                                            Then, regard delimiter+"^A" as a new delimiter.






                                                            share|improve this answer




























                                                              1












                                                              1








                                                              1







                                                              The default separator is "^A". In python language, it is "x01".



                                                              When I want to change the delimiter, I use SQL like:



                                                              SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table


                                                              Then, regard delimiter+"^A" as a new delimiter.






                                                              share|improve this answer















                                                              The default separator is "^A". In python language, it is "x01".



                                                              When I want to change the delimiter, I use SQL like:



                                                              SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table


                                                              Then, regard delimiter+"^A" as a new delimiter.







                                                              share|improve this answer














                                                              share|improve this answer



                                                              share|improve this answer








                                                              edited Jun 27 '16 at 7:59









                                                              Ram Ghadiyaram

                                                              16.5k64477




                                                              16.5k64477










                                                              answered Nov 6 '13 at 22:27









                                                              moshaholomoshaholo

                                                              1519




                                                              1519























                                                                  0














                                                                  Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.






                                                                  share|improve this answer




























                                                                    0














                                                                    Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.






                                                                    share|improve this answer


























                                                                      0












                                                                      0








                                                                      0







                                                                      Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.






                                                                      share|improve this answer













                                                                      Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.







                                                                      share|improve this answer












                                                                      share|improve this answer



                                                                      share|improve this answer










                                                                      answered Jan 3 '18 at 13:35









                                                                      schoonschoon

                                                                      78121233




                                                                      78121233























                                                                          0














                                                                          In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.



                                                                          It will:




                                                                          1. Login to bastion host.

                                                                          2. pbrun.

                                                                          3. kinit.

                                                                          4. beeline (with your query).

                                                                          5. Save echo from beeline to a file on Windows.


                                                                          Execute it like this:



                                                                          set PROXY_HOST=your_bastion_host

                                                                          set SERVICE_USER=you_func_user

                                                                          set LINUX_USER=your_SOID

                                                                          set LINUX_PWD=your_pwd

                                                                          python hh.py --query_file=query.sql





                                                                          share|improve this answer




























                                                                            0














                                                                            In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.



                                                                            It will:




                                                                            1. Login to bastion host.

                                                                            2. pbrun.

                                                                            3. kinit.

                                                                            4. beeline (with your query).

                                                                            5. Save echo from beeline to a file on Windows.


                                                                            Execute it like this:



                                                                            set PROXY_HOST=your_bastion_host

                                                                            set SERVICE_USER=you_func_user

                                                                            set LINUX_USER=your_SOID

                                                                            set LINUX_PWD=your_pwd

                                                                            python hh.py --query_file=query.sql





                                                                            share|improve this answer


























                                                                              0












                                                                              0








                                                                              0







                                                                              In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.



                                                                              It will:




                                                                              1. Login to bastion host.

                                                                              2. pbrun.

                                                                              3. kinit.

                                                                              4. beeline (with your query).

                                                                              5. Save echo from beeline to a file on Windows.


                                                                              Execute it like this:



                                                                              set PROXY_HOST=your_bastion_host

                                                                              set SERVICE_USER=you_func_user

                                                                              set LINUX_USER=your_SOID

                                                                              set LINUX_PWD=your_pwd

                                                                              python hh.py --query_file=query.sql





                                                                              share|improve this answer













                                                                              In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.



                                                                              It will:




                                                                              1. Login to bastion host.

                                                                              2. pbrun.

                                                                              3. kinit.

                                                                              4. beeline (with your query).

                                                                              5. Save echo from beeline to a file on Windows.


                                                                              Execute it like this:



                                                                              set PROXY_HOST=your_bastion_host

                                                                              set SERVICE_USER=you_func_user

                                                                              set LINUX_USER=your_SOID

                                                                              set LINUX_PWD=your_pwd

                                                                              python hh.py --query_file=query.sql






                                                                              share|improve this answer












                                                                              share|improve this answer



                                                                              share|improve this answer










                                                                              answered Nov 21 '18 at 14:57









                                                                              Alex BAlex B

                                                                              718719




                                                                              718719























                                                                                  0














                                                                                  I tried various options, but this would be one of the simplest solution for Python Pandas:



                                                                                  hive -e 'select books from table' | grep "|" ' > temp.csv

                                                                                  df=pd.read_csv("temp.csv",sep='|')


                                                                                  You can also use tr "|" "," to convert "|" to ","






                                                                                  share|improve this answer




























                                                                                    0














                                                                                    I tried various options, but this would be one of the simplest solution for Python Pandas:



                                                                                    hive -e 'select books from table' | grep "|" ' > temp.csv

                                                                                    df=pd.read_csv("temp.csv",sep='|')


                                                                                    You can also use tr "|" "," to convert "|" to ","






                                                                                    share|improve this answer


























                                                                                      0












                                                                                      0








                                                                                      0







                                                                                      I tried various options, but this would be one of the simplest solution for Python Pandas:



                                                                                      hive -e 'select books from table' | grep "|" ' > temp.csv

                                                                                      df=pd.read_csv("temp.csv",sep='|')


                                                                                      You can also use tr "|" "," to convert "|" to ","






                                                                                      share|improve this answer













                                                                                      I tried various options, but this would be one of the simplest solution for Python Pandas:



                                                                                      hive -e 'select books from table' | grep "|" ' > temp.csv

                                                                                      df=pd.read_csv("temp.csv",sep='|')


                                                                                      You can also use tr "|" "," to convert "|" to ","







                                                                                      share|improve this answer












                                                                                      share|improve this answer



                                                                                      share|improve this answer










                                                                                      answered Dec 11 '18 at 21:34









                                                                                      notilasnotilas

                                                                                      4811724




                                                                                      4811724






























                                                                                          draft saved

                                                                                          draft discarded




















































                                                                                          Thanks for contributing an answer to Stack Overflow!


                                                                                          • Please be sure to answer the question. Provide details and share your research!

                                                                                          But avoid



                                                                                          • Asking for help, clarification, or responding to other answers.

                                                                                          • Making statements based on opinion; back them up with references or personal experience.


                                                                                          To learn more, see our tips on writing great answers.




                                                                                          draft saved


                                                                                          draft discarded














                                                                                          StackExchange.ready(
                                                                                          function () {
                                                                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18129581%2fhow-do-i-output-the-results-of-a-hiveql-query-to-csv%23new-answer', 'question_page');
                                                                                          }
                                                                                          );

                                                                                          Post as a guest















                                                                                          Required, but never shown





















































                                                                                          Required, but never shown














                                                                                          Required, but never shown












                                                                                          Required, but never shown







                                                                                          Required, but never shown

































                                                                                          Required, but never shown














                                                                                          Required, but never shown












                                                                                          Required, but never shown







                                                                                          Required, but never shown







                                                                                          Popular posts from this blog

                                                                                          "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                                                                                          Alcedinidae

                                                                                          Origin of the phrase “under your belt”?