Correlated Subquery Versus Non-Correlated Subquery
There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?
DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)
Thank you!
sql-server subquery
add a comment |
There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?
DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)
Thank you!
sql-server subquery
add a comment |
There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?
DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)
Thank you!
sql-server subquery
There are three tables that may be used by a business to keep track of the absent days of its employees. It's from a collection of SQL puzzles by Joe Celko. The reader is tasked with concocting a DELETE statement to remove those employees who have accrued 40 or more absent days. And for that purpose we use a subquery. The one proposed by the author is this:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
And my question is, why go for a correlated one? Isn't it just more efficient to run something like one below?
DELETE FROM Personnel
WHERE emp_id =
(SELECT emp_id FROM Absenteeism
GROUP BY emp_id
HAVING SUM(severity_points) >= 40)
Thank you!
sql-server subquery
sql-server subquery
asked 2 days ago
Shams Observer
202
202
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f226095%2fcorrelated-subquery-versus-non-correlated-subquery%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
add a comment |
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
add a comment |
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
The non-correlated version you proposed will fail if the subquery returns more than one row (which it likely will). You should use IN
rather than =
here. The purpose of Celko's correlation is to make sure only zero or one rows is returned from the subquery (because it matches and groups on emp_id).
Also, don't forget to alias the table inside the subquery and explicitly reference the alias. Since both tables (Personnel and Absenteeism) have an emp_id column, you want to be very clear where you are referring to which table.
This is more of a "best practice" for readability, but can also prevent logical issues and mistakes in queries. For instance, this query would delete every row in the table:
DELETE FROM Personnel WHERE emp_id IN (SELECT emp_id);
Of course, this may seen like a contrived example, but it happens: sql server 2008 management studio not checking the syntax of my query
So here's the updated query:
DELETE FROM Personnel
WHERE emp_id IN
(SELECT a.emp_id FROM Absenteeism a
GROUP BY a.emp_id
HAVING SUM(a.severity_points) >= 40)
Given those changes: you're right that the correlation isn't strictly necessary in this case.
Logically, it seems like the correlated version might be more efficient, especially if there are lots and lots of rows in the Absenteeism table that don't have a match in the Personnel table (because the unbounded subquery might read more data).
But one would have to test to be sure. The query optimizer can take the declarative query you've written and do many different things with it.
edited yesterday
answered 2 days ago
jadarnel27
3,6501330
3,6501330
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f226095%2fcorrelated-subquery-versus-non-correlated-subquery%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown