Efficient Join in hive without OR condition











up vote
0
down vote

favorite












I need to join geographical region table to user's table in Hive.
geographical region can be country, state or city level.
When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.



What is the most efficient way to write this query?



For example,



Region table



region_id , city, state, country
1, Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA


User table



user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM


Result table



region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM









share|improve this question




























    up vote
    0
    down vote

    favorite












    I need to join geographical region table to user's table in Hive.
    geographical region can be country, state or city level.
    When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.



    What is the most efficient way to write this query?



    For example,



    Region table



    region_id , city, state, country
    1, Rome, NULL , IT
    2, NULL, NULL, BM
    3, VANCOUVER, BC, CA


    User table



    user_id, city , state, country
    103 , VANCOUVER , BC , CA
    105 , HAMILTON, NULL, BM
    106 , NULL, NULL, BM


    Result table



    region_id, user_id, city, state, country
    3, 103 , VANCOUVER , BC , CA
    2, 105 , HAMILTON, NULL, BM
    2, 106 , NULL, NULL, BM









    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I need to join geographical region table to user's table in Hive.
      geographical region can be country, state or city level.
      When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.



      What is the most efficient way to write this query?



      For example,



      Region table



      region_id , city, state, country
      1, Rome, NULL , IT
      2, NULL, NULL, BM
      3, VANCOUVER, BC, CA


      User table



      user_id, city , state, country
      103 , VANCOUVER , BC , CA
      105 , HAMILTON, NULL, BM
      106 , NULL, NULL, BM


      Result table



      region_id, user_id, city, state, country
      3, 103 , VANCOUVER , BC , CA
      2, 105 , HAMILTON, NULL, BM
      2, 106 , NULL, NULL, BM









      share|improve this question















      I need to join geographical region table to user's table in Hive.
      geographical region can be country, state or city level.
      When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.



      What is the most efficient way to write this query?



      For example,



      Region table



      region_id , city, state, country
      1, Rome, NULL , IT
      2, NULL, NULL, BM
      3, VANCOUVER, BC, CA


      User table



      user_id, city , state, country
      103 , VANCOUVER , BC , CA
      105 , HAMILTON, NULL, BM
      106 , NULL, NULL, BM


      Result table



      region_id, user_id, city, state, country
      3, 103 , VANCOUVER , BC , CA
      2, 105 , HAMILTON, NULL, BM
      2, 106 , NULL, NULL, BM






      sql hadoop hive hiveql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 19 at 0:41

























      asked Nov 19 at 0:36









      user1411335

      5791615




      5791615
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Well it may not be as efficient as you would like, but this should work:



          SELECT DISTINCT
          coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
          FROM users u
          LEFT JOIN regions cty ON u.city = cty.city
          LEFT JOIN regions sta ON u.state = sta.state
          LEFT JOIN regions cou ON u.ccountyity = cou.county


          and alternative would be:



          SELECT
          r.region_id
          , u.*
          FROM users u
          INNER JOIN (
          SELECT
          regions.region_id, users.user_id
          FROM users
          INNER JOIN regions ON users.city = regions.city
          UNION
          SELECT
          regions.region_id, users.user_id
          FROM users
          INNER JOIN regions ON usesr.state = regions.state
          UNION
          SELECT
          regions.region_id, users.user_id
          FROM users
          INNER JOIN regions ON users.ccounty = regions.county
          ) r ON u.users_id = r.users_id





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53366910%2fefficient-join-in-hive-without-or-condition%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            Well it may not be as efficient as you would like, but this should work:



            SELECT DISTINCT
            coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
            FROM users u
            LEFT JOIN regions cty ON u.city = cty.city
            LEFT JOIN regions sta ON u.state = sta.state
            LEFT JOIN regions cou ON u.ccountyity = cou.county


            and alternative would be:



            SELECT
            r.region_id
            , u.*
            FROM users u
            INNER JOIN (
            SELECT
            regions.region_id, users.user_id
            FROM users
            INNER JOIN regions ON users.city = regions.city
            UNION
            SELECT
            regions.region_id, users.user_id
            FROM users
            INNER JOIN regions ON usesr.state = regions.state
            UNION
            SELECT
            regions.region_id, users.user_id
            FROM users
            INNER JOIN regions ON users.ccounty = regions.county
            ) r ON u.users_id = r.users_id





            share|improve this answer



























              up vote
              1
              down vote



              accepted










              Well it may not be as efficient as you would like, but this should work:



              SELECT DISTINCT
              coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
              FROM users u
              LEFT JOIN regions cty ON u.city = cty.city
              LEFT JOIN regions sta ON u.state = sta.state
              LEFT JOIN regions cou ON u.ccountyity = cou.county


              and alternative would be:



              SELECT
              r.region_id
              , u.*
              FROM users u
              INNER JOIN (
              SELECT
              regions.region_id, users.user_id
              FROM users
              INNER JOIN regions ON users.city = regions.city
              UNION
              SELECT
              regions.region_id, users.user_id
              FROM users
              INNER JOIN regions ON usesr.state = regions.state
              UNION
              SELECT
              regions.region_id, users.user_id
              FROM users
              INNER JOIN regions ON users.ccounty = regions.county
              ) r ON u.users_id = r.users_id





              share|improve this answer

























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                Well it may not be as efficient as you would like, but this should work:



                SELECT DISTINCT
                coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
                FROM users u
                LEFT JOIN regions cty ON u.city = cty.city
                LEFT JOIN regions sta ON u.state = sta.state
                LEFT JOIN regions cou ON u.ccountyity = cou.county


                and alternative would be:



                SELECT
                r.region_id
                , u.*
                FROM users u
                INNER JOIN (
                SELECT
                regions.region_id, users.user_id
                FROM users
                INNER JOIN regions ON users.city = regions.city
                UNION
                SELECT
                regions.region_id, users.user_id
                FROM users
                INNER JOIN regions ON usesr.state = regions.state
                UNION
                SELECT
                regions.region_id, users.user_id
                FROM users
                INNER JOIN regions ON users.ccounty = regions.county
                ) r ON u.users_id = r.users_id





                share|improve this answer














                Well it may not be as efficient as you would like, but this should work:



                SELECT DISTINCT
                coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
                FROM users u
                LEFT JOIN regions cty ON u.city = cty.city
                LEFT JOIN regions sta ON u.state = sta.state
                LEFT JOIN regions cou ON u.ccountyity = cou.county


                and alternative would be:



                SELECT
                r.region_id
                , u.*
                FROM users u
                INNER JOIN (
                SELECT
                regions.region_id, users.user_id
                FROM users
                INNER JOIN regions ON users.city = regions.city
                UNION
                SELECT
                regions.region_id, users.user_id
                FROM users
                INNER JOIN regions ON usesr.state = regions.state
                UNION
                SELECT
                regions.region_id, users.user_id
                FROM users
                INNER JOIN regions ON users.ccounty = regions.county
                ) r ON u.users_id = r.users_id






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 19 at 2:08

























                answered Nov 19 at 2:02









                Used_By_Already

                21.7k21838




                21.7k21838






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53366910%2fefficient-join-in-hive-without-or-condition%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Paul Cézanne

                    UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

                    Angular material date-picker (MatDatepicker) auto completes the date on focus out