Efficient Join in hive without OR condition
up vote
0
down vote
favorite
I need to join geographical region table to user's table in Hive.
geographical region can be country, state or city level.
When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.
What is the most efficient way to write this query?
For example,
Region table
region_id , city, state, country
1, Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA
User table
user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM
Result table
region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM
sql hadoop hive hiveql
add a comment |
up vote
0
down vote
favorite
I need to join geographical region table to user's table in Hive.
geographical region can be country, state or city level.
When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.
What is the most efficient way to write this query?
For example,
Region table
region_id , city, state, country
1, Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA
User table
user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM
Result table
region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM
sql hadoop hive hiveql
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I need to join geographical region table to user's table in Hive.
geographical region can be country, state or city level.
When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.
What is the most efficient way to write this query?
For example,
Region table
region_id , city, state, country
1, Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA
User table
user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM
Result table
region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM
sql hadoop hive hiveql
I need to join geographical region table to user's table in Hive.
geographical region can be country, state or city level.
When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.
What is the most efficient way to write this query?
For example,
Region table
region_id , city, state, country
1, Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA
User table
user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM
Result table
region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM
sql hadoop hive hiveql
sql hadoop hive hiveql
edited Nov 19 at 0:41
asked Nov 19 at 0:36
user1411335
5791615
5791615
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
Well it may not be as efficient as you would like, but this should work:
SELECT DISTINCT
coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county
and alternative would be:
SELECT
r.region_id
, u.*
FROM users u
INNER JOIN (
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.city = regions.city
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON usesr.state = regions.state
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.ccounty = regions.county
) r ON u.users_id = r.users_id
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Well it may not be as efficient as you would like, but this should work:
SELECT DISTINCT
coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county
and alternative would be:
SELECT
r.region_id
, u.*
FROM users u
INNER JOIN (
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.city = regions.city
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON usesr.state = regions.state
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.ccounty = regions.county
) r ON u.users_id = r.users_id
add a comment |
up vote
1
down vote
accepted
Well it may not be as efficient as you would like, but this should work:
SELECT DISTINCT
coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county
and alternative would be:
SELECT
r.region_id
, u.*
FROM users u
INNER JOIN (
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.city = regions.city
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON usesr.state = regions.state
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.ccounty = regions.county
) r ON u.users_id = r.users_id
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Well it may not be as efficient as you would like, but this should work:
SELECT DISTINCT
coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county
and alternative would be:
SELECT
r.region_id
, u.*
FROM users u
INNER JOIN (
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.city = regions.city
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON usesr.state = regions.state
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.ccounty = regions.county
) r ON u.users_id = r.users_id
Well it may not be as efficient as you would like, but this should work:
SELECT DISTINCT
coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county
and alternative would be:
SELECT
r.region_id
, u.*
FROM users u
INNER JOIN (
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.city = regions.city
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON usesr.state = regions.state
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.ccounty = regions.county
) r ON u.users_id = r.users_id
edited Nov 19 at 2:08
answered Nov 19 at 2:02
Used_By_Already
21.7k21838
21.7k21838
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53366910%2fefficient-join-in-hive-without-or-condition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown