Query with conditional lag statement












1















I'm trying to find the previous value of a column where the row meets some criteria. Consider the table:



| user_id | session_id | time       | referrer   |  
|---------|------------|------------|------------|
| 1 | 1 | 2018-01-01 | [NULL] |
| 1 | 2 | 2018-02-01 | google.com |
| 1 | 3 | 2018-03-01 | google.com |


I want to find, for each session, the previous value of session_id where the referrer is NULL. So, for the second AND third rows, the value of parent_session_id should be 1.



However, by just using lag(session_id) over (partition by user_id order by time), I will get parent_session_id=2 for the 3rd row.



I suspect it can be done using a combination of window functions, but I just can't figure it out.










share|improve this question



























    1















    I'm trying to find the previous value of a column where the row meets some criteria. Consider the table:



    | user_id | session_id | time       | referrer   |  
    |---------|------------|------------|------------|
    | 1 | 1 | 2018-01-01 | [NULL] |
    | 1 | 2 | 2018-02-01 | google.com |
    | 1 | 3 | 2018-03-01 | google.com |


    I want to find, for each session, the previous value of session_id where the referrer is NULL. So, for the second AND third rows, the value of parent_session_id should be 1.



    However, by just using lag(session_id) over (partition by user_id order by time), I will get parent_session_id=2 for the 3rd row.



    I suspect it can be done using a combination of window functions, but I just can't figure it out.










    share|improve this question

























      1












      1








      1








      I'm trying to find the previous value of a column where the row meets some criteria. Consider the table:



      | user_id | session_id | time       | referrer   |  
      |---------|------------|------------|------------|
      | 1 | 1 | 2018-01-01 | [NULL] |
      | 1 | 2 | 2018-02-01 | google.com |
      | 1 | 3 | 2018-03-01 | google.com |


      I want to find, for each session, the previous value of session_id where the referrer is NULL. So, for the second AND third rows, the value of parent_session_id should be 1.



      However, by just using lag(session_id) over (partition by user_id order by time), I will get parent_session_id=2 for the 3rd row.



      I suspect it can be done using a combination of window functions, but I just can't figure it out.










      share|improve this question














      I'm trying to find the previous value of a column where the row meets some criteria. Consider the table:



      | user_id | session_id | time       | referrer   |  
      |---------|------------|------------|------------|
      | 1 | 1 | 2018-01-01 | [NULL] |
      | 1 | 2 | 2018-02-01 | google.com |
      | 1 | 3 | 2018-03-01 | google.com |


      I want to find, for each session, the previous value of session_id where the referrer is NULL. So, for the second AND third rows, the value of parent_session_id should be 1.



      However, by just using lag(session_id) over (partition by user_id order by time), I will get parent_session_id=2 for the 3rd row.



      I suspect it can be done using a combination of window functions, but I just can't figure it out.







      google-bigquery standard-sql






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 '18 at 7:18









      Moon_WatcherMoon_Watcher

      1141112




      1141112
























          2 Answers
          2






          active

          oldest

          votes


















          1














          I'd use last_value() in combination with if():



          WITH t AS (SELECT * FROM UNNEST([ 
          struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
          (1,2,date('2018-02-01'), 'google.com'),
          (1,3,date('2018-03-01'), 'google.com')
          ]) )

          SELECT
          *,
          last_value(IF(referrer is null, session_id, NULL) ignore nulls)
          over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
          FROM t





          share|improve this answer


























          • perfect, thanks!

            – Moon_Watcher
            Nov 23 '18 at 10:17



















          1














          You could even do this via a correlated subquery:



          SELECT
          session_id,
          (SELECT MAX(t2.session_id) FROM yourTable t2
          WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
          FROM yourTable t1
          ORDER BY
          session_id;


          Here is an approach using analytic functions which might work:



          WITH cte AS (
          SELECT *,
          SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
          OVER (ORDER BY session_id) cnt
          FROM yourTable
          )

          SELECT
          session_id,
          CASE WHEN cnt = 0
          THEN NULL
          ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
          FROM cte
          ORDER BY
          session_id;





          share|improve this answer


























          • true, but I'm hoping for a more efficient solution that doesn't involve a self join

            – Moon_Watcher
            Nov 22 '18 at 7:30











          • I'm not doing a self join, but yes this may not run fast for very large tables.

            – Tim Biegeleisen
            Nov 22 '18 at 7:30











          • yeah, the table I'm working with contains web traffic data and hence pretty huge

            – Moon_Watcher
            Nov 22 '18 at 7:31











          • @Moon_Watcher I gave you an option using analytic functions.

            – Tim Biegeleisen
            Nov 22 '18 at 7:41











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425697%2fquery-with-conditional-lag-statement%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          I'd use last_value() in combination with if():



          WITH t AS (SELECT * FROM UNNEST([ 
          struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
          (1,2,date('2018-02-01'), 'google.com'),
          (1,3,date('2018-03-01'), 'google.com')
          ]) )

          SELECT
          *,
          last_value(IF(referrer is null, session_id, NULL) ignore nulls)
          over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
          FROM t





          share|improve this answer


























          • perfect, thanks!

            – Moon_Watcher
            Nov 23 '18 at 10:17
















          1














          I'd use last_value() in combination with if():



          WITH t AS (SELECT * FROM UNNEST([ 
          struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
          (1,2,date('2018-02-01'), 'google.com'),
          (1,3,date('2018-03-01'), 'google.com')
          ]) )

          SELECT
          *,
          last_value(IF(referrer is null, session_id, NULL) ignore nulls)
          over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
          FROM t





          share|improve this answer


























          • perfect, thanks!

            – Moon_Watcher
            Nov 23 '18 at 10:17














          1












          1








          1







          I'd use last_value() in combination with if():



          WITH t AS (SELECT * FROM UNNEST([ 
          struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
          (1,2,date('2018-02-01'), 'google.com'),
          (1,3,date('2018-03-01'), 'google.com')
          ]) )

          SELECT
          *,
          last_value(IF(referrer is null, session_id, NULL) ignore nulls)
          over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
          FROM t





          share|improve this answer















          I'd use last_value() in combination with if():



          WITH t AS (SELECT * FROM UNNEST([ 
          struct<user_id int64, session_id int64, time date, referrer string>(1, 1, date('2018-01-01'), NULL),
          (1,2,date('2018-02-01'), 'google.com'),
          (1,3,date('2018-03-01'), 'google.com')
          ]) )

          SELECT
          *,
          last_value(IF(referrer is null, session_id, NULL) ignore nulls)
          over (partition by user_id order by time rows between unbounded preceding and 1 preceding) lastNullrefSession
          FROM t






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 22 '18 at 10:38

























          answered Nov 22 '18 at 7:40









          Martin WeitzmannMartin Weitzmann

          1,844513




          1,844513













          • perfect, thanks!

            – Moon_Watcher
            Nov 23 '18 at 10:17



















          • perfect, thanks!

            – Moon_Watcher
            Nov 23 '18 at 10:17

















          perfect, thanks!

          – Moon_Watcher
          Nov 23 '18 at 10:17





          perfect, thanks!

          – Moon_Watcher
          Nov 23 '18 at 10:17













          1














          You could even do this via a correlated subquery:



          SELECT
          session_id,
          (SELECT MAX(t2.session_id) FROM yourTable t2
          WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
          FROM yourTable t1
          ORDER BY
          session_id;


          Here is an approach using analytic functions which might work:



          WITH cte AS (
          SELECT *,
          SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
          OVER (ORDER BY session_id) cnt
          FROM yourTable
          )

          SELECT
          session_id,
          CASE WHEN cnt = 0
          THEN NULL
          ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
          FROM cte
          ORDER BY
          session_id;





          share|improve this answer


























          • true, but I'm hoping for a more efficient solution that doesn't involve a self join

            – Moon_Watcher
            Nov 22 '18 at 7:30











          • I'm not doing a self join, but yes this may not run fast for very large tables.

            – Tim Biegeleisen
            Nov 22 '18 at 7:30











          • yeah, the table I'm working with contains web traffic data and hence pretty huge

            – Moon_Watcher
            Nov 22 '18 at 7:31











          • @Moon_Watcher I gave you an option using analytic functions.

            – Tim Biegeleisen
            Nov 22 '18 at 7:41
















          1














          You could even do this via a correlated subquery:



          SELECT
          session_id,
          (SELECT MAX(t2.session_id) FROM yourTable t2
          WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
          FROM yourTable t1
          ORDER BY
          session_id;


          Here is an approach using analytic functions which might work:



          WITH cte AS (
          SELECT *,
          SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
          OVER (ORDER BY session_id) cnt
          FROM yourTable
          )

          SELECT
          session_id,
          CASE WHEN cnt = 0
          THEN NULL
          ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
          FROM cte
          ORDER BY
          session_id;





          share|improve this answer


























          • true, but I'm hoping for a more efficient solution that doesn't involve a self join

            – Moon_Watcher
            Nov 22 '18 at 7:30











          • I'm not doing a self join, but yes this may not run fast for very large tables.

            – Tim Biegeleisen
            Nov 22 '18 at 7:30











          • yeah, the table I'm working with contains web traffic data and hence pretty huge

            – Moon_Watcher
            Nov 22 '18 at 7:31











          • @Moon_Watcher I gave you an option using analytic functions.

            – Tim Biegeleisen
            Nov 22 '18 at 7:41














          1












          1








          1







          You could even do this via a correlated subquery:



          SELECT
          session_id,
          (SELECT MAX(t2.session_id) FROM yourTable t2
          WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
          FROM yourTable t1
          ORDER BY
          session_id;


          Here is an approach using analytic functions which might work:



          WITH cte AS (
          SELECT *,
          SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
          OVER (ORDER BY session_id) cnt
          FROM yourTable
          )

          SELECT
          session_id,
          CASE WHEN cnt = 0
          THEN NULL
          ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
          FROM cte
          ORDER BY
          session_id;





          share|improve this answer















          You could even do this via a correlated subquery:



          SELECT
          session_id,
          (SELECT MAX(t2.session_id) FROM yourTable t2
          WHERE t2.referrer IS NULL AND t2.session_id < t1.session_id) prev_session_id
          FROM yourTable t1
          ORDER BY
          session_id;


          Here is an approach using analytic functions which might work:



          WITH cte AS (
          SELECT *,
          SUM(CASE WHEN referrer IS NULL THEN 1 ELSE 0 END)
          OVER (ORDER BY session_id) cnt
          FROM yourTable
          )

          SELECT
          session_id,
          CASE WHEN cnt = 0
          THEN NULL
          ELSE MIN(session_id) OVER (PARTITION BY cnt) END prev_session_id
          FROM cte
          ORDER BY
          session_id;






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 22 '18 at 7:47

























          answered Nov 22 '18 at 7:26









          Tim BiegeleisenTim Biegeleisen

          227k1394147




          227k1394147













          • true, but I'm hoping for a more efficient solution that doesn't involve a self join

            – Moon_Watcher
            Nov 22 '18 at 7:30











          • I'm not doing a self join, but yes this may not run fast for very large tables.

            – Tim Biegeleisen
            Nov 22 '18 at 7:30











          • yeah, the table I'm working with contains web traffic data and hence pretty huge

            – Moon_Watcher
            Nov 22 '18 at 7:31











          • @Moon_Watcher I gave you an option using analytic functions.

            – Tim Biegeleisen
            Nov 22 '18 at 7:41



















          • true, but I'm hoping for a more efficient solution that doesn't involve a self join

            – Moon_Watcher
            Nov 22 '18 at 7:30











          • I'm not doing a self join, but yes this may not run fast for very large tables.

            – Tim Biegeleisen
            Nov 22 '18 at 7:30











          • yeah, the table I'm working with contains web traffic data and hence pretty huge

            – Moon_Watcher
            Nov 22 '18 at 7:31











          • @Moon_Watcher I gave you an option using analytic functions.

            – Tim Biegeleisen
            Nov 22 '18 at 7:41

















          true, but I'm hoping for a more efficient solution that doesn't involve a self join

          – Moon_Watcher
          Nov 22 '18 at 7:30





          true, but I'm hoping for a more efficient solution that doesn't involve a self join

          – Moon_Watcher
          Nov 22 '18 at 7:30













          I'm not doing a self join, but yes this may not run fast for very large tables.

          – Tim Biegeleisen
          Nov 22 '18 at 7:30





          I'm not doing a self join, but yes this may not run fast for very large tables.

          – Tim Biegeleisen
          Nov 22 '18 at 7:30













          yeah, the table I'm working with contains web traffic data and hence pretty huge

          – Moon_Watcher
          Nov 22 '18 at 7:31





          yeah, the table I'm working with contains web traffic data and hence pretty huge

          – Moon_Watcher
          Nov 22 '18 at 7:31













          @Moon_Watcher I gave you an option using analytic functions.

          – Tim Biegeleisen
          Nov 22 '18 at 7:41





          @Moon_Watcher I gave you an option using analytic functions.

          – Tim Biegeleisen
          Nov 22 '18 at 7:41


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53425697%2fquery-with-conditional-lag-statement%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

          Alcedinidae

          RAC Tourist Trophy