Query with conditional lag statement
I'm trying to find the previous value of a column where the row meets some criteria. Consider the table:
| user_id | session_id | time | referrer |
|---------|------------|------------|------------|
| 1 | 1 | 2018-01-01 | [NULL] |
| 1 | 2 | 2018-02-01 | google.com |
| 1 | 3 | 2018-03-01 | google.com |
I want to find, for each session, the previous value of session_id where the referrer is NULL. So, for the second AND third rows, the value of parent_session_id
should be 1.
However, by just using lag(session_id) over (partition by user_id order by time)
, I will get parent_session_id
=2 for the 3rd row.
I suspect it can be done using a combination of window functions, but I just can't figure it out.
google-bigquery standard-sql
add a comment |
I'm trying to find the previous value of a column where the row meets some criteria. Consider the table:
| user_id | session_id | time | referrer |
|---------|------------|------------|------------|
| 1 | 1 | 2018-01-01 | [NULL] |
| 1 | 2 | 2018-02-01 | google.com |
| 1 | 3 | 2018-03-01 | google.com |
I want to find, for each session, the previous value of session_id where the referrer is NULL. So, for the second AND third rows, the value of parent_session_id
should be 1.
However, by just using lag(session_id) over (partition by user_id order by time)
, I will get parent_session_id
=2 for the 3rd row.
I suspect it can be done using a combination of window functions, but I just can't figure it out.
google-bigquery standard-sql
add a comment |
I'm trying to find the previous value of a column where the row meets some criteria. Consider the table:
| user_id | session_id | time | referrer |
|---------|------------|------------|------------|
| 1 | 1 | 2018-01-01 | [NULL] |
| 1 | 2 | 2018-02-01 | google.com |
| 1 | 3 | 2018-03-01 | google.com |
I want to find, for each session, the previous value of session_id where the referrer is NULL. So, for the second AND third rows, the value of parent_session_id
should be 1.
However, by just using lag(session_id) over (partition by user_id order by time)
, I will get parent_session_id
=2 for the 3rd row.
I suspect it can be done using a combination of window functions, but I just can't figure it out.
google-bigquery standard-sql
I'm trying to find the previous value of a column where the row meets some criteria. Consider the table:
| user_id | session_id | time | referrer |
|---------|------------|------------|------------|
| 1 | 1 | 2018-01-01 | [NULL] |
| 1 | 2 | 2018-02-01 | google.com |
| 1 | 3 | 2018-03-01 | google.com |
I want to find, for each session, the previous value of session_id where the referrer is NULL. So, for the second AND third rows, the value of parent_session_id
should be 1.
However, by just using lag(session_id) over (partition by user_id order by time)
, I will get parent_session_id
=2 for the 3rd row.
I suspect it can be done using a combination of window functions, but I just can't figure it out.
google-bigquery standard-sql
google-bigquery standard-sql
asked Nov 22 '18 at 7:18
Moon_WatcherMoon_Watcher
1141112
1141112
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I'd use last_value() in combination with if():
WITH t AS (SELECT * FROM UNNEST([
struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
(1,2,date('2018-02-01'), 'google.com'),
(1,3,date('2018-03-01'), 'google.com')
]) )
SELECT
*,
last_value(IF(referrer is null, session_id, NULL) ignore nulls)
over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t
perfect, thanks!
– Moon_Watcher
Nov 23 '18 at 10:17
add a comment |
You could even do this via a correlated subquery:
SELECT
session_id,
(SELECT MAX(t2.session_id) FROM yourTable t2
WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
session_id;
Here is an approach using analytic functions which might work:
WITH cte AS (
SELECT *,
SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY session_id) cnt
FROM yourTable
)
SELECT
session_id,
CASE WHEN cnt = 0
THEN NULL
ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
session_id;
true, but I'm hoping for a more efficient solution that doesn't involve a self join
– Moon_Watcher
Nov 22 '18 at 7:30
I'm not doing a self join, but yes this may not run fast for very large tables.
– Tim Biegeleisen
Nov 22 '18 at 7:30
yeah, the table I'm working with contains web traffic data and hence pretty huge
– Moon_Watcher
Nov 22 '18 at 7:31
@Moon_Watcher I gave you an option using analytic functions.
– Tim Biegeleisen
Nov 22 '18 at 7:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425697%2fquery-with-conditional-lag-statement%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'd use last_value() in combination with if():
WITH t AS (SELECT * FROM UNNEST([
struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
(1,2,date('2018-02-01'), 'google.com'),
(1,3,date('2018-03-01'), 'google.com')
]) )
SELECT
*,
last_value(IF(referrer is null, session_id, NULL) ignore nulls)
over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t
perfect, thanks!
– Moon_Watcher
Nov 23 '18 at 10:17
add a comment |
I'd use last_value() in combination with if():
WITH t AS (SELECT * FROM UNNEST([
struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
(1,2,date('2018-02-01'), 'google.com'),
(1,3,date('2018-03-01'), 'google.com')
]) )
SELECT
*,
last_value(IF(referrer is null, session_id, NULL) ignore nulls)
over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t
perfect, thanks!
– Moon_Watcher
Nov 23 '18 at 10:17
add a comment |
I'd use last_value() in combination with if():
WITH t AS (SELECT * FROM UNNEST([
struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
(1,2,date('2018-02-01'), 'google.com'),
(1,3,date('2018-03-01'), 'google.com')
]) )
SELECT
*,
last_value(IF(referrer is null, session_id, NULL) ignore nulls)
over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t
I'd use last_value() in combination with if():
WITH t AS (SELECT * FROM UNNEST([
struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
(1,2,date('2018-02-01'), 'google.com'),
(1,3,date('2018-03-01'), 'google.com')
]) )
SELECT
*,
last_value(IF(referrer is null, session_id, NULL) ignore nulls)
over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
FROM t
edited Nov 22 '18 at 10:38
answered Nov 22 '18 at 7:40
Martin WeitzmannMartin Weitzmann
1,844513
1,844513
perfect, thanks!
– Moon_Watcher
Nov 23 '18 at 10:17
add a comment |
perfect, thanks!
– Moon_Watcher
Nov 23 '18 at 10:17
perfect, thanks!
– Moon_Watcher
Nov 23 '18 at 10:17
perfect, thanks!
– Moon_Watcher
Nov 23 '18 at 10:17
add a comment |
You could even do this via a correlated subquery:
SELECT
session_id,
(SELECT MAX(t2.session_id) FROM yourTable t2
WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
session_id;
Here is an approach using analytic functions which might work:
WITH cte AS (
SELECT *,
SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY session_id) cnt
FROM yourTable
)
SELECT
session_id,
CASE WHEN cnt = 0
THEN NULL
ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
session_id;
true, but I'm hoping for a more efficient solution that doesn't involve a self join
– Moon_Watcher
Nov 22 '18 at 7:30
I'm not doing a self join, but yes this may not run fast for very large tables.
– Tim Biegeleisen
Nov 22 '18 at 7:30
yeah, the table I'm working with contains web traffic data and hence pretty huge
– Moon_Watcher
Nov 22 '18 at 7:31
@Moon_Watcher I gave you an option using analytic functions.
– Tim Biegeleisen
Nov 22 '18 at 7:41
add a comment |
You could even do this via a correlated subquery:
SELECT
session_id,
(SELECT MAX(t2.session_id) FROM yourTable t2
WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
session_id;
Here is an approach using analytic functions which might work:
WITH cte AS (
SELECT *,
SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY session_id) cnt
FROM yourTable
)
SELECT
session_id,
CASE WHEN cnt = 0
THEN NULL
ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
session_id;
true, but I'm hoping for a more efficient solution that doesn't involve a self join
– Moon_Watcher
Nov 22 '18 at 7:30
I'm not doing a self join, but yes this may not run fast for very large tables.
– Tim Biegeleisen
Nov 22 '18 at 7:30
yeah, the table I'm working with contains web traffic data and hence pretty huge
– Moon_Watcher
Nov 22 '18 at 7:31
@Moon_Watcher I gave you an option using analytic functions.
– Tim Biegeleisen
Nov 22 '18 at 7:41
add a comment |
You could even do this via a correlated subquery:
SELECT
session_id,
(SELECT MAX(t2.session_id) FROM yourTable t2
WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
session_id;
Here is an approach using analytic functions which might work:
WITH cte AS (
SELECT *,
SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY session_id) cnt
FROM yourTable
)
SELECT
session_id,
CASE WHEN cnt = 0
THEN NULL
ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
session_id;
You could even do this via a correlated subquery:
SELECT
session_id,
(SELECT MAX(t2.session_id) FROM yourTable t2
WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
FROM yourTable t1
ORDER BY
session_id;
Here is an approach using analytic functions which might work:
WITH cte AS (
SELECT *,
SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
OVER (ORDER BY session_id) cnt
FROM yourTable
)
SELECT
session_id,
CASE WHEN cnt = 0
THEN NULL
ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
FROM cte
ORDER BY
session_id;
edited Nov 22 '18 at 7:47
answered Nov 22 '18 at 7:26
Tim BiegeleisenTim Biegeleisen
227k1394147
227k1394147
true, but I'm hoping for a more efficient solution that doesn't involve a self join
– Moon_Watcher
Nov 22 '18 at 7:30
I'm not doing a self join, but yes this may not run fast for very large tables.
– Tim Biegeleisen
Nov 22 '18 at 7:30
yeah, the table I'm working with contains web traffic data and hence pretty huge
– Moon_Watcher
Nov 22 '18 at 7:31
@Moon_Watcher I gave you an option using analytic functions.
– Tim Biegeleisen
Nov 22 '18 at 7:41
add a comment |
true, but I'm hoping for a more efficient solution that doesn't involve a self join
– Moon_Watcher
Nov 22 '18 at 7:30
I'm not doing a self join, but yes this may not run fast for very large tables.
– Tim Biegeleisen
Nov 22 '18 at 7:30
yeah, the table I'm working with contains web traffic data and hence pretty huge
– Moon_Watcher
Nov 22 '18 at 7:31
@Moon_Watcher I gave you an option using analytic functions.
– Tim Biegeleisen
Nov 22 '18 at 7:41
true, but I'm hoping for a more efficient solution that doesn't involve a self join
– Moon_Watcher
Nov 22 '18 at 7:30
true, but I'm hoping for a more efficient solution that doesn't involve a self join
– Moon_Watcher
Nov 22 '18 at 7:30
I'm not doing a self join, but yes this may not run fast for very large tables.
– Tim Biegeleisen
Nov 22 '18 at 7:30
I'm not doing a self join, but yes this may not run fast for very large tables.
– Tim Biegeleisen
Nov 22 '18 at 7:30
yeah, the table I'm working with contains web traffic data and hence pretty huge
– Moon_Watcher
Nov 22 '18 at 7:31
yeah, the table I'm working with contains web traffic data and hence pretty huge
– Moon_Watcher
Nov 22 '18 at 7:31
@Moon_Watcher I gave you an option using analytic functions.
– Tim Biegeleisen
Nov 22 '18 at 7:41
@Moon_Watcher I gave you an option using analytic functions.
– Tim Biegeleisen
Nov 22 '18 at 7:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425697%2fquery-with-conditional-lag-statement%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown