Strange query plan when using OR in JOIN clause - Constant scan for every row in table












12














I'm trying to produce a sample query plan to show why UNIONing two result-sets can be better than using OR in a JOIN clause. A query plan I've written has me stumped. I'm using the StackOverflow database with a nonclustered index on Users.Reputation.



Query plan image
The query is



CREATE NONCLUSTERED INDEX IX_NC_REPUTATION ON dbo.USERS(Reputation)
SELECT DISTINCT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.OwnerUserId
OR Users.Id = Posts.LastEditorUserId
WHERE Users.Reputation = 5


The query plan is at https://www.brentozar.com/pastetheplan/?id=BkpZU1MZE, query duration for me is 4:37 min, 26612 rows returned.



I haven't seen this style of constant-scan being created from an existing table before - I'm unfamiliar with why there's a constant scan ran for every single row, when a constant scan is usually used for a single row inputted by the user for example SELECT GETDATE(). Why is it used here? I would really appreciate some guidance in reading this query plan.



If I split that OR out into a UNION, it produces a standard plan running in 12 seconds with the same 26612 rows returned.



SELECT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.OwnerUserId
WHERE Users.Reputation = 5
UNION
SELECT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.LastEditorUserId
WHERE Users.Reputation = 5


I interpret this plan as doing this:




  • Get all 41782500 rows from Posts (the actual number of rows matches the CI scan on Posts)

  • For each 41782500 rows in Posts:


    • Produce scalars:

    • Expr1005: OwnerUserId

    • Expr1006: OwnerUserId

    • Expr1004: The static value 62

    • Expr1008: LastEditorUserId

    • Expr1009: LastEditorUserId

    • Expr1007: The static value 62



  • In the concatenate:


    • Exp1010: If Expr1005 (OwnerUserId) is not null, use that else use Expr1008 (LastEditorUserID)

    • Expr1011: If Expr1006 (OwnerUserId) is not null, use that, else use Expr1009 (LastEditorUserId)

    • Expr1012: If Expr1004 (62) is null use that, else use Expr1007 (62)



  • In the Compute scalar: I don't know what an ampersand does.


    • Expr1013: 4 [and?] 62 (Expr1012) = 4 and OwnerUserId IS NULL (NULL = Expr1010)

    • Expr1014: 4 [and?] 62 (Expr1012)

    • Expr1015: 16 and 62 (Expr1012)



  • In the Order By sort by:


    • Expr1013 Desc

    • Expr1014 Asc

    • Expr1010 Asc

    • Expr1015 Desc



  • In the Merge Interval it removed Expr1013 and Expr1015 (these are inputs but not outputs)

  • In the Index seek below the nested loops join it is using Expr1010 and Expr1011 as seek predicates, but I don't understand how it has access to these when it hasn't done the nested loop join from IX_NC_REPUTATION to the subtree containing Expr1010 and Expr1011.

  • The Nested Loops join returns only the Users.IDs that have a match in the earlier subtree. Because of predicate pushdown, all rows returned from the index seek on IX_NC_REPUTATION are returned.

  • The last Nested Loops join: For each Posts record, output Users.Id where a match is found in the below dataset.










share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Did you try with an EXISTS subquery or subqueries? SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND ( EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.OwnerUserId) OR EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.LastEditorUserId) ) ;
    – yper-crazyhat-cubeᵀᴹ
    2 days ago












  • one subquery: SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id IN (Posts.OwnerUserId, Posts.LastEditorUserId) ) ;
    – yper-crazyhat-cubeᵀᴹ
    2 days ago
















12














I'm trying to produce a sample query plan to show why UNIONing two result-sets can be better than using OR in a JOIN clause. A query plan I've written has me stumped. I'm using the StackOverflow database with a nonclustered index on Users.Reputation.



Query plan image
The query is



CREATE NONCLUSTERED INDEX IX_NC_REPUTATION ON dbo.USERS(Reputation)
SELECT DISTINCT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.OwnerUserId
OR Users.Id = Posts.LastEditorUserId
WHERE Users.Reputation = 5


The query plan is at https://www.brentozar.com/pastetheplan/?id=BkpZU1MZE, query duration for me is 4:37 min, 26612 rows returned.



I haven't seen this style of constant-scan being created from an existing table before - I'm unfamiliar with why there's a constant scan ran for every single row, when a constant scan is usually used for a single row inputted by the user for example SELECT GETDATE(). Why is it used here? I would really appreciate some guidance in reading this query plan.



If I split that OR out into a UNION, it produces a standard plan running in 12 seconds with the same 26612 rows returned.



SELECT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.OwnerUserId
WHERE Users.Reputation = 5
UNION
SELECT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.LastEditorUserId
WHERE Users.Reputation = 5


I interpret this plan as doing this:




  • Get all 41782500 rows from Posts (the actual number of rows matches the CI scan on Posts)

  • For each 41782500 rows in Posts:


    • Produce scalars:

    • Expr1005: OwnerUserId

    • Expr1006: OwnerUserId

    • Expr1004: The static value 62

    • Expr1008: LastEditorUserId

    • Expr1009: LastEditorUserId

    • Expr1007: The static value 62



  • In the concatenate:


    • Exp1010: If Expr1005 (OwnerUserId) is not null, use that else use Expr1008 (LastEditorUserID)

    • Expr1011: If Expr1006 (OwnerUserId) is not null, use that, else use Expr1009 (LastEditorUserId)

    • Expr1012: If Expr1004 (62) is null use that, else use Expr1007 (62)



  • In the Compute scalar: I don't know what an ampersand does.


    • Expr1013: 4 [and?] 62 (Expr1012) = 4 and OwnerUserId IS NULL (NULL = Expr1010)

    • Expr1014: 4 [and?] 62 (Expr1012)

    • Expr1015: 16 and 62 (Expr1012)



  • In the Order By sort by:


    • Expr1013 Desc

    • Expr1014 Asc

    • Expr1010 Asc

    • Expr1015 Desc



  • In the Merge Interval it removed Expr1013 and Expr1015 (these are inputs but not outputs)

  • In the Index seek below the nested loops join it is using Expr1010 and Expr1011 as seek predicates, but I don't understand how it has access to these when it hasn't done the nested loop join from IX_NC_REPUTATION to the subtree containing Expr1010 and Expr1011.

  • The Nested Loops join returns only the Users.IDs that have a match in the earlier subtree. Because of predicate pushdown, all rows returned from the index seek on IX_NC_REPUTATION are returned.

  • The last Nested Loops join: For each Posts record, output Users.Id where a match is found in the below dataset.










share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Did you try with an EXISTS subquery or subqueries? SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND ( EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.OwnerUserId) OR EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.LastEditorUserId) ) ;
    – yper-crazyhat-cubeᵀᴹ
    2 days ago












  • one subquery: SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id IN (Posts.OwnerUserId, Posts.LastEditorUserId) ) ;
    – yper-crazyhat-cubeᵀᴹ
    2 days ago














12












12








12


3





I'm trying to produce a sample query plan to show why UNIONing two result-sets can be better than using OR in a JOIN clause. A query plan I've written has me stumped. I'm using the StackOverflow database with a nonclustered index on Users.Reputation.



Query plan image
The query is



CREATE NONCLUSTERED INDEX IX_NC_REPUTATION ON dbo.USERS(Reputation)
SELECT DISTINCT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.OwnerUserId
OR Users.Id = Posts.LastEditorUserId
WHERE Users.Reputation = 5


The query plan is at https://www.brentozar.com/pastetheplan/?id=BkpZU1MZE, query duration for me is 4:37 min, 26612 rows returned.



I haven't seen this style of constant-scan being created from an existing table before - I'm unfamiliar with why there's a constant scan ran for every single row, when a constant scan is usually used for a single row inputted by the user for example SELECT GETDATE(). Why is it used here? I would really appreciate some guidance in reading this query plan.



If I split that OR out into a UNION, it produces a standard plan running in 12 seconds with the same 26612 rows returned.



SELECT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.OwnerUserId
WHERE Users.Reputation = 5
UNION
SELECT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.LastEditorUserId
WHERE Users.Reputation = 5


I interpret this plan as doing this:




  • Get all 41782500 rows from Posts (the actual number of rows matches the CI scan on Posts)

  • For each 41782500 rows in Posts:


    • Produce scalars:

    • Expr1005: OwnerUserId

    • Expr1006: OwnerUserId

    • Expr1004: The static value 62

    • Expr1008: LastEditorUserId

    • Expr1009: LastEditorUserId

    • Expr1007: The static value 62



  • In the concatenate:


    • Exp1010: If Expr1005 (OwnerUserId) is not null, use that else use Expr1008 (LastEditorUserID)

    • Expr1011: If Expr1006 (OwnerUserId) is not null, use that, else use Expr1009 (LastEditorUserId)

    • Expr1012: If Expr1004 (62) is null use that, else use Expr1007 (62)



  • In the Compute scalar: I don't know what an ampersand does.


    • Expr1013: 4 [and?] 62 (Expr1012) = 4 and OwnerUserId IS NULL (NULL = Expr1010)

    • Expr1014: 4 [and?] 62 (Expr1012)

    • Expr1015: 16 and 62 (Expr1012)



  • In the Order By sort by:


    • Expr1013 Desc

    • Expr1014 Asc

    • Expr1010 Asc

    • Expr1015 Desc



  • In the Merge Interval it removed Expr1013 and Expr1015 (these are inputs but not outputs)

  • In the Index seek below the nested loops join it is using Expr1010 and Expr1011 as seek predicates, but I don't understand how it has access to these when it hasn't done the nested loop join from IX_NC_REPUTATION to the subtree containing Expr1010 and Expr1011.

  • The Nested Loops join returns only the Users.IDs that have a match in the earlier subtree. Because of predicate pushdown, all rows returned from the index seek on IX_NC_REPUTATION are returned.

  • The last Nested Loops join: For each Posts record, output Users.Id where a match is found in the below dataset.










share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I'm trying to produce a sample query plan to show why UNIONing two result-sets can be better than using OR in a JOIN clause. A query plan I've written has me stumped. I'm using the StackOverflow database with a nonclustered index on Users.Reputation.



Query plan image
The query is



CREATE NONCLUSTERED INDEX IX_NC_REPUTATION ON dbo.USERS(Reputation)
SELECT DISTINCT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.OwnerUserId
OR Users.Id = Posts.LastEditorUserId
WHERE Users.Reputation = 5


The query plan is at https://www.brentozar.com/pastetheplan/?id=BkpZU1MZE, query duration for me is 4:37 min, 26612 rows returned.



I haven't seen this style of constant-scan being created from an existing table before - I'm unfamiliar with why there's a constant scan ran for every single row, when a constant scan is usually used for a single row inputted by the user for example SELECT GETDATE(). Why is it used here? I would really appreciate some guidance in reading this query plan.



If I split that OR out into a UNION, it produces a standard plan running in 12 seconds with the same 26612 rows returned.



SELECT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.OwnerUserId
WHERE Users.Reputation = 5
UNION
SELECT Users.Id
FROM dbo.Users
INNER JOIN dbo.Posts
ON Users.Id = Posts.LastEditorUserId
WHERE Users.Reputation = 5


I interpret this plan as doing this:




  • Get all 41782500 rows from Posts (the actual number of rows matches the CI scan on Posts)

  • For each 41782500 rows in Posts:


    • Produce scalars:

    • Expr1005: OwnerUserId

    • Expr1006: OwnerUserId

    • Expr1004: The static value 62

    • Expr1008: LastEditorUserId

    • Expr1009: LastEditorUserId

    • Expr1007: The static value 62



  • In the concatenate:


    • Exp1010: If Expr1005 (OwnerUserId) is not null, use that else use Expr1008 (LastEditorUserID)

    • Expr1011: If Expr1006 (OwnerUserId) is not null, use that, else use Expr1009 (LastEditorUserId)

    • Expr1012: If Expr1004 (62) is null use that, else use Expr1007 (62)



  • In the Compute scalar: I don't know what an ampersand does.


    • Expr1013: 4 [and?] 62 (Expr1012) = 4 and OwnerUserId IS NULL (NULL = Expr1010)

    • Expr1014: 4 [and?] 62 (Expr1012)

    • Expr1015: 16 and 62 (Expr1012)



  • In the Order By sort by:


    • Expr1013 Desc

    • Expr1014 Asc

    • Expr1010 Asc

    • Expr1015 Desc



  • In the Merge Interval it removed Expr1013 and Expr1015 (these are inputs but not outputs)

  • In the Index seek below the nested loops join it is using Expr1010 and Expr1011 as seek predicates, but I don't understand how it has access to these when it hasn't done the nested loop join from IX_NC_REPUTATION to the subtree containing Expr1010 and Expr1011.

  • The Nested Loops join returns only the Users.IDs that have a match in the earlier subtree. Because of predicate pushdown, all rows returned from the index seek on IX_NC_REPUTATION are returned.

  • The last Nested Loops join: For each Posts record, output Users.Id where a match is found in the below dataset.







sql-server execution-plan sql-server-2017






share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago









jadarnel27

3,5551330




3,5551330






New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Andrew

634




634




New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Did you try with an EXISTS subquery or subqueries? SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND ( EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.OwnerUserId) OR EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.LastEditorUserId) ) ;
    – yper-crazyhat-cubeᵀᴹ
    2 days ago












  • one subquery: SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id IN (Posts.OwnerUserId, Posts.LastEditorUserId) ) ;
    – yper-crazyhat-cubeᵀᴹ
    2 days ago


















  • Did you try with an EXISTS subquery or subqueries? SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND ( EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.OwnerUserId) OR EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.LastEditorUserId) ) ;
    – yper-crazyhat-cubeᵀᴹ
    2 days ago












  • one subquery: SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id IN (Posts.OwnerUserId, Posts.LastEditorUserId) ) ;
    – yper-crazyhat-cubeᵀᴹ
    2 days ago
















Did you try with an EXISTS subquery or subqueries? SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND ( EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.OwnerUserId) OR EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.LastEditorUserId) ) ;
– yper-crazyhat-cubeᵀᴹ
2 days ago






Did you try with an EXISTS subquery or subqueries? SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND ( EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.OwnerUserId) OR EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id = Posts.LastEditorUserId) ) ;
– yper-crazyhat-cubeᵀᴹ
2 days ago














one subquery: SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id IN (Posts.OwnerUserId, Posts.LastEditorUserId) ) ;
– yper-crazyhat-cubeᵀᴹ
2 days ago




one subquery: SELECT Users.Id FROM dbo.Users WHERE Users.Reputation = 5 AND EXISTS (SELECT 1 FROM dbo.Posts WHERE Users.Id IN (Posts.OwnerUserId, Posts.LastEditorUserId) ) ;
– yper-crazyhat-cubeᵀᴹ
2 days ago










1 Answer
1






active

oldest

votes


















10














The plan is similar to the one I go into in more detail here.



The Posts table is scanned.



For each row it extracts the OwnerUserId and LastEditorUserId. This is in a similar manner to the way UNPIVOT works. You see a single constant scan operator in the plan for the below creating the two output rows for each input row.



SELECT *
FROM dbo.Posts
UNPIVOT (X FOR U IN (OwnerUserId,LastEditorUserId)) Unpvt


In this case the plan is a bit more complex as the semantics for or are that if both column values are the same only one row should be emitted from the join on Users (not two)



These are then put through the merge interval so that in the event the values are the same the range is collapsed down and only one seek is executed against Users - otherwise two seeks are executed against it.



The value 62 is a flag meaning that the seek should be an equality seek.



Regarding




I don't understand how it has access to these when it hasn't done the
nested loop join from IX_NC_REPUTATION to the subtree containing
Expr1010 and Expr1011




These are defined in the yellow highlighted concatenation operator. This is on the outer side of the yellow highlighted nested loops. So this runs before the yellow highlighted seek on the inside of that nested loops.



enter image description here



A rewrite that gives a similar plan (though with the merge interval replaced by a merge union) is below in case this helps.



SELECT DISTINCT D2.UserId
FROM dbo.Posts p
CROSS APPLY (SELECT Users.Id AS UserId
FROM (SELECT p.OwnerUserId
UNION /*collapse duplicate to single row*/
SELECT p.LastEditorUserId) D1(UserId)
JOIN Users
ON Users.Id = D1.UserId) D2
OPTION (FORCE ORDER)


enter image description here



Dependant on what indexes are available on the Posts table a variant of this query may be more efficient than your proposed UNION ALL solution. (the copy of the database I have has no useful index for this and the proposed solution does two full scans of Posts. The below does it in one scan)



WITH Unpivoted AS
(
SELECT UserId
FROM dbo.Posts
UNPIVOT (UserId FOR U IN (OwnerUserId,LastEditorUserId)) Unpivoted
)
SELECT DISTINCT Users.Id
FROM dbo.Users INNER HASH JOIN Unpivoted
ON Users.Id = Unpivoted.UserId
WHERE Users.Reputation = 5


enter image description here






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    Andrew is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f225835%2fstrange-query-plan-when-using-or-in-join-clause-constant-scan-for-every-row-in%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    10














    The plan is similar to the one I go into in more detail here.



    The Posts table is scanned.



    For each row it extracts the OwnerUserId and LastEditorUserId. This is in a similar manner to the way UNPIVOT works. You see a single constant scan operator in the plan for the below creating the two output rows for each input row.



    SELECT *
    FROM dbo.Posts
    UNPIVOT (X FOR U IN (OwnerUserId,LastEditorUserId)) Unpvt


    In this case the plan is a bit more complex as the semantics for or are that if both column values are the same only one row should be emitted from the join on Users (not two)



    These are then put through the merge interval so that in the event the values are the same the range is collapsed down and only one seek is executed against Users - otherwise two seeks are executed against it.



    The value 62 is a flag meaning that the seek should be an equality seek.



    Regarding




    I don't understand how it has access to these when it hasn't done the
    nested loop join from IX_NC_REPUTATION to the subtree containing
    Expr1010 and Expr1011




    These are defined in the yellow highlighted concatenation operator. This is on the outer side of the yellow highlighted nested loops. So this runs before the yellow highlighted seek on the inside of that nested loops.



    enter image description here



    A rewrite that gives a similar plan (though with the merge interval replaced by a merge union) is below in case this helps.



    SELECT DISTINCT D2.UserId
    FROM dbo.Posts p
    CROSS APPLY (SELECT Users.Id AS UserId
    FROM (SELECT p.OwnerUserId
    UNION /*collapse duplicate to single row*/
    SELECT p.LastEditorUserId) D1(UserId)
    JOIN Users
    ON Users.Id = D1.UserId) D2
    OPTION (FORCE ORDER)


    enter image description here



    Dependant on what indexes are available on the Posts table a variant of this query may be more efficient than your proposed UNION ALL solution. (the copy of the database I have has no useful index for this and the proposed solution does two full scans of Posts. The below does it in one scan)



    WITH Unpivoted AS
    (
    SELECT UserId
    FROM dbo.Posts
    UNPIVOT (UserId FOR U IN (OwnerUserId,LastEditorUserId)) Unpivoted
    )
    SELECT DISTINCT Users.Id
    FROM dbo.Users INNER HASH JOIN Unpivoted
    ON Users.Id = Unpivoted.UserId
    WHERE Users.Reputation = 5


    enter image description here






    share|improve this answer




























      10














      The plan is similar to the one I go into in more detail here.



      The Posts table is scanned.



      For each row it extracts the OwnerUserId and LastEditorUserId. This is in a similar manner to the way UNPIVOT works. You see a single constant scan operator in the plan for the below creating the two output rows for each input row.



      SELECT *
      FROM dbo.Posts
      UNPIVOT (X FOR U IN (OwnerUserId,LastEditorUserId)) Unpvt


      In this case the plan is a bit more complex as the semantics for or are that if both column values are the same only one row should be emitted from the join on Users (not two)



      These are then put through the merge interval so that in the event the values are the same the range is collapsed down and only one seek is executed against Users - otherwise two seeks are executed against it.



      The value 62 is a flag meaning that the seek should be an equality seek.



      Regarding




      I don't understand how it has access to these when it hasn't done the
      nested loop join from IX_NC_REPUTATION to the subtree containing
      Expr1010 and Expr1011




      These are defined in the yellow highlighted concatenation operator. This is on the outer side of the yellow highlighted nested loops. So this runs before the yellow highlighted seek on the inside of that nested loops.



      enter image description here



      A rewrite that gives a similar plan (though with the merge interval replaced by a merge union) is below in case this helps.



      SELECT DISTINCT D2.UserId
      FROM dbo.Posts p
      CROSS APPLY (SELECT Users.Id AS UserId
      FROM (SELECT p.OwnerUserId
      UNION /*collapse duplicate to single row*/
      SELECT p.LastEditorUserId) D1(UserId)
      JOIN Users
      ON Users.Id = D1.UserId) D2
      OPTION (FORCE ORDER)


      enter image description here



      Dependant on what indexes are available on the Posts table a variant of this query may be more efficient than your proposed UNION ALL solution. (the copy of the database I have has no useful index for this and the proposed solution does two full scans of Posts. The below does it in one scan)



      WITH Unpivoted AS
      (
      SELECT UserId
      FROM dbo.Posts
      UNPIVOT (UserId FOR U IN (OwnerUserId,LastEditorUserId)) Unpivoted
      )
      SELECT DISTINCT Users.Id
      FROM dbo.Users INNER HASH JOIN Unpivoted
      ON Users.Id = Unpivoted.UserId
      WHERE Users.Reputation = 5


      enter image description here






      share|improve this answer


























        10












        10








        10






        The plan is similar to the one I go into in more detail here.



        The Posts table is scanned.



        For each row it extracts the OwnerUserId and LastEditorUserId. This is in a similar manner to the way UNPIVOT works. You see a single constant scan operator in the plan for the below creating the two output rows for each input row.



        SELECT *
        FROM dbo.Posts
        UNPIVOT (X FOR U IN (OwnerUserId,LastEditorUserId)) Unpvt


        In this case the plan is a bit more complex as the semantics for or are that if both column values are the same only one row should be emitted from the join on Users (not two)



        These are then put through the merge interval so that in the event the values are the same the range is collapsed down and only one seek is executed against Users - otherwise two seeks are executed against it.



        The value 62 is a flag meaning that the seek should be an equality seek.



        Regarding




        I don't understand how it has access to these when it hasn't done the
        nested loop join from IX_NC_REPUTATION to the subtree containing
        Expr1010 and Expr1011




        These are defined in the yellow highlighted concatenation operator. This is on the outer side of the yellow highlighted nested loops. So this runs before the yellow highlighted seek on the inside of that nested loops.



        enter image description here



        A rewrite that gives a similar plan (though with the merge interval replaced by a merge union) is below in case this helps.



        SELECT DISTINCT D2.UserId
        FROM dbo.Posts p
        CROSS APPLY (SELECT Users.Id AS UserId
        FROM (SELECT p.OwnerUserId
        UNION /*collapse duplicate to single row*/
        SELECT p.LastEditorUserId) D1(UserId)
        JOIN Users
        ON Users.Id = D1.UserId) D2
        OPTION (FORCE ORDER)


        enter image description here



        Dependant on what indexes are available on the Posts table a variant of this query may be more efficient than your proposed UNION ALL solution. (the copy of the database I have has no useful index for this and the proposed solution does two full scans of Posts. The below does it in one scan)



        WITH Unpivoted AS
        (
        SELECT UserId
        FROM dbo.Posts
        UNPIVOT (UserId FOR U IN (OwnerUserId,LastEditorUserId)) Unpivoted
        )
        SELECT DISTINCT Users.Id
        FROM dbo.Users INNER HASH JOIN Unpivoted
        ON Users.Id = Unpivoted.UserId
        WHERE Users.Reputation = 5


        enter image description here






        share|improve this answer














        The plan is similar to the one I go into in more detail here.



        The Posts table is scanned.



        For each row it extracts the OwnerUserId and LastEditorUserId. This is in a similar manner to the way UNPIVOT works. You see a single constant scan operator in the plan for the below creating the two output rows for each input row.



        SELECT *
        FROM dbo.Posts
        UNPIVOT (X FOR U IN (OwnerUserId,LastEditorUserId)) Unpvt


        In this case the plan is a bit more complex as the semantics for or are that if both column values are the same only one row should be emitted from the join on Users (not two)



        These are then put through the merge interval so that in the event the values are the same the range is collapsed down and only one seek is executed against Users - otherwise two seeks are executed against it.



        The value 62 is a flag meaning that the seek should be an equality seek.



        Regarding




        I don't understand how it has access to these when it hasn't done the
        nested loop join from IX_NC_REPUTATION to the subtree containing
        Expr1010 and Expr1011




        These are defined in the yellow highlighted concatenation operator. This is on the outer side of the yellow highlighted nested loops. So this runs before the yellow highlighted seek on the inside of that nested loops.



        enter image description here



        A rewrite that gives a similar plan (though with the merge interval replaced by a merge union) is below in case this helps.



        SELECT DISTINCT D2.UserId
        FROM dbo.Posts p
        CROSS APPLY (SELECT Users.Id AS UserId
        FROM (SELECT p.OwnerUserId
        UNION /*collapse duplicate to single row*/
        SELECT p.LastEditorUserId) D1(UserId)
        JOIN Users
        ON Users.Id = D1.UserId) D2
        OPTION (FORCE ORDER)


        enter image description here



        Dependant on what indexes are available on the Posts table a variant of this query may be more efficient than your proposed UNION ALL solution. (the copy of the database I have has no useful index for this and the proposed solution does two full scans of Posts. The below does it in one scan)



        WITH Unpivoted AS
        (
        SELECT UserId
        FROM dbo.Posts
        UNPIVOT (UserId FOR U IN (OwnerUserId,LastEditorUserId)) Unpivoted
        )
        SELECT DISTINCT Users.Id
        FROM dbo.Users INNER HASH JOIN Unpivoted
        ON Users.Id = Unpivoted.UserId
        WHERE Users.Reputation = 5


        enter image description here







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 2 days ago

























        answered 2 days ago









        Martin Smith

        61.6k10166246




        61.6k10166246






















            Andrew is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            Andrew is a new contributor. Be nice, and check out our Code of Conduct.













            Andrew is a new contributor. Be nice, and check out our Code of Conduct.












            Andrew is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f225835%2fstrange-query-plan-when-using-or-in-join-clause-constant-scan-for-every-row-in%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            If I really need a card on my start hand, how many mulligans make sense? [duplicate]

            Alcedinidae

            Can an atomic nucleus contain both particles and antiparticles? [duplicate]