Correlated Subquery Versus Non-Correlated Subquery












3














There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:



DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);


And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?



DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)


Thank you!










share|improve this question



























    3














    There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:



    DELETE FROM Personnel
    WHERE emp_id = (SELECT A1.emp_id
    FROM Absenteeism AS A1
    WHERE A1.emp_id = Personnel.emp_id
    GROUP BY A1.emp_id
    HAVING SUM(severity_points) >= 40);


    And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?



    DELETE FROM Personnel
    WHERE emp_id =
    (SELECT emp_id FROM Absenteeism
    GROUP BY emp_id
    HAVING SUM(severity_points) >= 40)


    Thank you!










    share|improve this question

























      3












      3








      3


      2





      There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:



      DELETE FROM Personnel
      WHERE emp_id = (SELECT A1.emp_id
      FROM Absenteeism AS A1
      WHERE A1.emp_id = Personnel.emp_id
      GROUP BY A1.emp_id
      HAVING SUM(severity_points) >= 40);


      And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?



      DELETE FROM Personnel
      WHERE emp_id =
      (SELECT emp_id FROM Absenteeism
      GROUP BY emp_id
      HAVING SUM(severity_points) >= 40)


      Thank you!










      share|improve this question













      There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:



      DELETE FROM Personnel
      WHERE emp_id = (SELECT A1.emp_id
      FROM Absenteeism AS A1
      WHERE A1.emp_id = Personnel.emp_id
      GROUP BY A1.emp_id
      HAVING SUM(severity_points) >= 40);


      And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?



      DELETE FROM Personnel
      WHERE emp_id =
      (SELECT emp_id FROM Absenteeism
      GROUP BY emp_id
      HAVING SUM(severity_points) >= 40)


      Thank you!







      sql-server subquery






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 2 days ago









      Shams Observer

      202




      202






















          1 Answer
          1






          active

          oldest

          votes


















          3














          The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN rather than = here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).



          Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.



          This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:



          DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);


          Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query



          So here's the updated query:



          DELETE FROM Personnel
          WHERE emp_id IN
          (SELECT a.emp_id FROM Absenteeism a
          GROUP BY a.emp_id
          HAVING SUM(a.severity_points) >= 40)


          Given those changes: you're right that the correlation isn't strictly necessary in this case.



          Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).



          But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.






          share|improve this answer























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "182"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f226095%2fcorrelated-subquery-versus-non-correlated-subquery%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3














            The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN rather than = here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).



            Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.



            This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:



            DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);


            Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query



            So here's the updated query:



            DELETE FROM Personnel
            WHERE emp_id IN
            (SELECT a.emp_id FROM Absenteeism a
            GROUP BY a.emp_id
            HAVING SUM(a.severity_points) >= 40)


            Given those changes: you're right that the correlation isn't strictly necessary in this case.



            Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).



            But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.






            share|improve this answer




























              3














              The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN rather than = here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).



              Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.



              This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:



              DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);


              Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query



              So here's the updated query:



              DELETE FROM Personnel
              WHERE emp_id IN
              (SELECT a.emp_id FROM Absenteeism a
              GROUP BY a.emp_id
              HAVING SUM(a.severity_points) >= 40)


              Given those changes: you're right that the correlation isn't strictly necessary in this case.



              Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).



              But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.






              share|improve this answer


























                3












                3








                3






                The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN rather than = here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).



                Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.



                This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:



                DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);


                Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query



                So here's the updated query:



                DELETE FROM Personnel
                WHERE emp_id IN
                (SELECT a.emp_id FROM Absenteeism a
                GROUP BY a.emp_id
                HAVING SUM(a.severity_points) >= 40)


                Given those changes: you're right that the correlation isn't strictly necessary in this case.



                Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).



                But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.






                share|improve this answer














                The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN rather than = here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).



                Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.



                This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:



                DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);


                Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query



                So here's the updated query:



                DELETE FROM Personnel
                WHERE emp_id IN
                (SELECT a.emp_id FROM Absenteeism a
                GROUP BY a.emp_id
                HAVING SUM(a.severity_points) >= 40)


                Given those changes: you're right that the correlation isn't strictly necessary in this case.



                Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).



                But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited yesterday

























                answered 2 days ago









                jadarnel27

                3,6501330




                3,6501330






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f226095%2fcorrelated-subquery-versus-non-correlated-subquery%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                    Alcedinidae

                    RAC Tourist Trophy