Joining ~ 40 normalized tables for 200 columns to display detailed user profile. Can this be optimized?
A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.
profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...
profile -> profile_network -> one company, one school, ...
profile -> ...
A lookup table for the fields looks like this:
id type
1 ♈ ARIES
2 ♉ TAURUS
3 ♊ GEMINI
4 ♋ CANCER
5 ♌ LEO
6 ♍ VIRGO
7 ♎ LIBRA
8 ♏ SCORPIO
9 ♐ SAGITTARIUS
10 ♑ CAPRICORN
11 ♒ AQUARIUS
12 ♓ PISCES
If the field is a single selection, we join the lookup table and we're done.
-> profile_date (zodiac_type_id) -> zodiac_type.id
If the field is a multiple selection, there's a join table in between for the many-to-many.
-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id
In addition, I am joining other things like this to get all of their photos:
profile -> personas -> photos
In addition, the query also filters for things like persons who like each other, etc.
The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.
That's the idea. You can see more of the fields here:
Sample query:
https://pastebin.com/wug86Wrr
Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv
Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation)
. Paid users can filter by any field. :)
PostgreSQL 9.6
Some ideas I had were:
Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)
Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.
Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.
postgresql database-design
add a comment |
A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.
profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...
profile -> profile_network -> one company, one school, ...
profile -> ...
A lookup table for the fields looks like this:
id type
1 ♈ ARIES
2 ♉ TAURUS
3 ♊ GEMINI
4 ♋ CANCER
5 ♌ LEO
6 ♍ VIRGO
7 ♎ LIBRA
8 ♏ SCORPIO
9 ♐ SAGITTARIUS
10 ♑ CAPRICORN
11 ♒ AQUARIUS
12 ♓ PISCES
If the field is a single selection, we join the lookup table and we're done.
-> profile_date (zodiac_type_id) -> zodiac_type.id
If the field is a multiple selection, there's a join table in between for the many-to-many.
-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id
In addition, I am joining other things like this to get all of their photos:
profile -> personas -> photos
In addition, the query also filters for things like persons who like each other, etc.
The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.
That's the idea. You can see more of the fields here:
Sample query:
https://pastebin.com/wug86Wrr
Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv
Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation)
. Paid users can filter by any field. :)
PostgreSQL 9.6
Some ideas I had were:
Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)
Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.
Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.
postgresql database-design
1
You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.
– ypercubeᵀᴹ
Mar 24 at 17:04
1
Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!
– Olivier Jacot-Descombes
Mar 24 at 18:22
2
You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.
– philipxy
Mar 24 at 20:03
1
If you have a main person recordp
speaking languages{l1, l2}
who visited countries{c1, c2, c3}
and you join them, the result will be{ {p, l1, c1}, {p, l1, c2}, {p, l1, c3}, {p, l2, c1}, {p, l2, c2}, {p, l2, c3}, }
. Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.
– Olivier Jacot-Descombes
Mar 25 at 12:51
add a comment |
A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.
profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...
profile -> profile_network -> one company, one school, ...
profile -> ...
A lookup table for the fields looks like this:
id type
1 ♈ ARIES
2 ♉ TAURUS
3 ♊ GEMINI
4 ♋ CANCER
5 ♌ LEO
6 ♍ VIRGO
7 ♎ LIBRA
8 ♏ SCORPIO
9 ♐ SAGITTARIUS
10 ♑ CAPRICORN
11 ♒ AQUARIUS
12 ♓ PISCES
If the field is a single selection, we join the lookup table and we're done.
-> profile_date (zodiac_type_id) -> zodiac_type.id
If the field is a multiple selection, there's a join table in between for the many-to-many.
-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id
In addition, I am joining other things like this to get all of their photos:
profile -> personas -> photos
In addition, the query also filters for things like persons who like each other, etc.
The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.
That's the idea. You can see more of the fields here:
Sample query:
https://pastebin.com/wug86Wrr
Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv
Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation)
. Paid users can filter by any field. :)
PostgreSQL 9.6
Some ideas I had were:
Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)
Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.
Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.
postgresql database-design
A profile can have many profile types. That profile can have many fields, which are all tied to lookup tables. Some fields can have many selections.
profile -> profile_date -> one relationship status, one zodiac, many languages, many countries traveled, many cuisines, many activities, ...
profile -> profile_network -> one company, one school, ...
profile -> ...
A lookup table for the fields looks like this:
id type
1 ♈ ARIES
2 ♉ TAURUS
3 ♊ GEMINI
4 ♋ CANCER
5 ♌ LEO
6 ♍ VIRGO
7 ♎ LIBRA
8 ♏ SCORPIO
9 ♐ SAGITTARIUS
10 ♑ CAPRICORN
11 ♒ AQUARIUS
12 ♓ PISCES
If the field is a single selection, we join the lookup table and we're done.
-> profile_date (zodiac_type_id) -> zodiac_type.id
If the field is a multiple selection, there's a join table in between for the many-to-many.
-> profile_date -> (profile_date_id) profile_date_languages (language_type_id) -> language_type.id
In addition, I am joining other things like this to get all of their photos:
profile -> personas -> photos
In addition, the query also filters for things like persons who like each other, etc.
The query seems to start performing poorly with just a low number of profiles, like less than 10. If a single person has an obnoxious profile where they check all the multiple selections e.g. speak every language, been to every country, it grinds the database to crippled performance for everyone if their mega profile ends up being fetched.
That's the idea. You can see more of the fields here:
Sample query:
https://pastebin.com/wug86Wrr
Sample query plan:
http://www.sharecsv.com/s/a6333a7f610e049dd81ebcfc19a4c02f/temp_file_limit_exceeded.csv
Users will always be filtering by at least profile.birth_date (age), profile.radius (geolocation)
. Paid users can filter by any field. :)
PostgreSQL 9.6
Some ideas I had were:
Optimize this query (I am not sure what can be done, am mostly wondering if I'm just doing something really stupid/unreasonable)
Right now I am joining many tables per card, so the initial load is slower, but the data is all there if the user decides to explore and things look instant to them. I could instead fetch the minimum preview for the cards (join few tables per card), then only join the full profile (1 card filtered by id, many/the rest of the joins) if the user clicks on the preview to see the detail view (but I feel this is not the best user experience because the user has to now wait for the load every time they click a profile which is more noticeable). I am not sure this would even help because users can also filter by these fields, so I believe I would need to join them beforehand.
Use ElasticSearch (I feel like I wouldn't need this until many more profiles?) I've done something similar to this with Algolia and it handled it like a champ, but I imagine this can just be done with PostgreSQL.
postgresql database-design
postgresql database-design
edited Mar 24 at 21:54
David Browne - Microsoft
12.2k729
12.2k729
asked Mar 24 at 14:42
atkaylaatkayla
1085
1085
1
You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.
– ypercubeᵀᴹ
Mar 24 at 17:04
1
Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!
– Olivier Jacot-Descombes
Mar 24 at 18:22
2
You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.
– philipxy
Mar 24 at 20:03
1
If you have a main person recordp
speaking languages{l1, l2}
who visited countries{c1, c2, c3}
and you join them, the result will be{ {p, l1, c1}, {p, l1, c2}, {p, l1, c3}, {p, l2, c1}, {p, l2, c2}, {p, l2, c3}, }
. Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.
– Olivier Jacot-Descombes
Mar 25 at 12:51
add a comment |
1
You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.
– ypercubeᵀᴹ
Mar 24 at 17:04
1
Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!
– Olivier Jacot-Descombes
Mar 24 at 18:22
2
You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.
– philipxy
Mar 24 at 20:03
1
If you have a main person recordp
speaking languages{l1, l2}
who visited countries{c1, c2, c3}
and you join them, the result will be{ {p, l1, c1}, {p, l1, c2}, {p, l1, c3}, {p, l2, c1}, {p, l2, c2}, {p, l2, c3}, }
. Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.
– Olivier Jacot-Descombes
Mar 25 at 12:51
1
1
You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.
– ypercubeᵀᴹ
Mar 24 at 17:04
You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.
– ypercubeᵀᴹ
Mar 24 at 17:04
1
1
Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!
– Olivier Jacot-Descombes
Mar 24 at 18:22
Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!
– Olivier Jacot-Descombes
Mar 24 at 18:22
2
2
You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.
– philipxy
Mar 24 at 20:03
You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.
– philipxy
Mar 24 at 20:03
1
1
If you have a main person record
p
speaking languages {l1, l2}
who visited countries {c1, c2, c3}
and you join them, the result will be { {p, l1, c1}, {p, l1, c2}, {p, l1, c3}, {p, l2, c1}, {p, l2, c2}, {p, l2, c3}, }
. Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.– Olivier Jacot-Descombes
Mar 25 at 12:51
If you have a main person record
p
speaking languages {l1, l2}
who visited countries {c1, c2, c3}
and you join them, the result will be { {p, l1, c1}, {p, l1, c2}, {p, l1, c3}, {p, l2, c1}, {p, l2, c2}, {p, l2, c3}, }
. Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.– Olivier Jacot-Descombes
Mar 25 at 12:51
add a comment |
2 Answers
2
active
oldest
votes
Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:
CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);
Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.
BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model
HTH
Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.
– atkayla
Mar 24 at 16:23
I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.
– atkayla
Mar 24 at 16:24
2
"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg
– SQLRaptor
Mar 24 at 16:29
I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail andPro Artificial vs. Pro Natural
, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!
– atkayla
Mar 24 at 17:33
2
+1 for the clarification! :-)
– Vérace
Mar 24 at 18:31
|
show 4 more comments
Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.
If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).
If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50)
1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.
The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum
type, which is simple to handle, just as fast and only occupies 4 bytes on disk.
I prepared a quick demo to compare:
Comparing 3 variants for your simple "zodiac" case:
- Surrogate keys
- Natural keys
- Enums
db<>fiddle here
Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.
Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.
About the 1-byte "char"
type I used in the demo:
- How to store one-byte integer in PostgreSQL?
1
It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.
– SQLRaptor
Mar 25 at 16:47
Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.
– Erwin Brandstetter
Mar 25 at 16:58
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232952%2fjoining-40-normalized-tables-for-200-columns-to-display-detailed-user-profile%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:
CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);
Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.
BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model
HTH
Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.
– atkayla
Mar 24 at 16:23
I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.
– atkayla
Mar 24 at 16:24
2
"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg
– SQLRaptor
Mar 24 at 16:29
I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail andPro Artificial vs. Pro Natural
, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!
– atkayla
Mar 24 at 17:33
2
+1 for the clarification! :-)
– Vérace
Mar 24 at 18:31
|
show 4 more comments
Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:
CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);
Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.
BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model
HTH
Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.
– atkayla
Mar 24 at 16:23
I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.
– atkayla
Mar 24 at 16:24
2
"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg
– SQLRaptor
Mar 24 at 16:29
I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail andPro Artificial vs. Pro Natural
, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!
– atkayla
Mar 24 at 17:33
2
+1 for the clarification! :-)
– Vérace
Mar 24 at 18:31
|
show 4 more comments
Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:
CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);
Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.
BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model
HTH
Option 4 - correct your schema design and stop using 'IDs' or whatever other surrogate key, the zodiac table is NOT a lookup table. it's a relation that holds all zodiac signs, and the way you tell the zodiac signs apart is their name, therefore, the key should be:
CREATE TABLE Zodiacs (Zodiac VARCHAR(11) PRIMARY KEY);
Use the Zodiac as the foreign key in your profile table, and now you don't need a join at all.
Do that to all the other one-to many as well, like relationship status, company, school etc. and you just saved yourself most of the joins.
For the many-to-many, like languages and countries, so the same, and you will now need only one join instead of two for each relation.
Give it a try and see how all your queries magically shrink by 90%, and perform orders of magnitude better. you will also explicitly maintain uniqueness and prevent logical duplicates, and you will also see how filters perform orders of magnitude better as the optimizer will be able to correctly evaluate distribution of the actual value, not some random ID from another table.
BTW - there are no fields in a SQL database, there are columns and rows only.
These are the counterparts of Tuples and attributes in the relational model
HTH
edited Mar 24 at 18:25
answered Mar 24 at 15:38
SQLRaptorSQLRaptor
2,6011320
2,6011320
Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.
– atkayla
Mar 24 at 16:23
I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.
– atkayla
Mar 24 at 16:24
2
"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg
– SQLRaptor
Mar 24 at 16:29
I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail andPro Artificial vs. Pro Natural
, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!
– atkayla
Mar 24 at 17:33
2
+1 for the clarification! :-)
– Vérace
Mar 24 at 18:31
|
show 4 more comments
Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.
– atkayla
Mar 24 at 16:23
I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.
– atkayla
Mar 24 at 16:24
2
"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg
– SQLRaptor
Mar 24 at 16:29
I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail andPro Artificial vs. Pro Natural
, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!
– atkayla
Mar 24 at 17:33
2
+1 for the clarification! :-)
– Vérace
Mar 24 at 18:31
Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.
– atkayla
Mar 24 at 16:23
Oh man. Drawing it on the whiteboard, I clearly see how this would eliminate each of the joins to one, and 1 less from each of the joins to many. I can see how this would make it significantly faster/better/cleaner. Before I built this thing out, I read various questions and even posted one of my own to the topic of natural key vs. join surrogate key. The consensus seems to be to use that surrogate key id, but seeing this now before me it makes no sense to do that ridiculous amount of those extra joins.
– atkayla
Mar 24 at 16:23
I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.
– atkayla
Mar 24 at 16:24
I was wondering if you could elaborate for me why the recommendations people make to use the surrogate id key don't really apply to my use case. I read the information and still made the wrong decision, so I want to make sure I really understand. - stackoverflow.com/questions/54844992/… - stackoverflow.com/questions/337503/… - stackoverflow.com/questions/63090/… It seems like the cons of my surrogate key id far outweighed pros.
– atkayla
Mar 24 at 16:24
2
2
"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg
– SQLRaptor
Mar 24 at 16:29
"The recommendations that people make" are based on myths that go back 50 years, from the days where magnetic tapes required physical address pointers. It has become so widespread, that people rarely stop and think about it, and just take it for granted. It doesn't apply to your case only, it applies to EVERY relational database on the planet. Watch this video to get more answers: youtube.com/watch?v=r2MDDPDi_Pg
– SQLRaptor
Mar 24 at 16:29
I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and
Pro Artificial vs. Pro Natural
, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!– atkayla
Mar 24 at 17:33
I really enjoyed your talk and learned a lot! I love how it's about this exact topic in great detail and
Pro Artificial vs. Pro Natural
, the 99.5% vs. the 0.5%, haha. I mean with those Pro Artifical (id, type) lookups, what is recommended as the "de-facto standard", I'm already seeing issues with just 10 profiles in my database during an alpha test. You just got a new convert. :) I will refactor my entire schema and report back with whether I think it was a good decision or not, although I'm sure it will be due to the obvious elimination of so many joins. Just need to see for myself!– atkayla
Mar 24 at 17:33
2
2
+1 for the clarification! :-)
– Vérace
Mar 24 at 18:31
+1 for the clarification! :-)
– Vérace
Mar 24 at 18:31
|
show 4 more comments
Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.
If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).
If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50)
1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.
The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum
type, which is simple to handle, just as fast and only occupies 4 bytes on disk.
I prepared a quick demo to compare:
Comparing 3 variants for your simple "zodiac" case:
- Surrogate keys
- Natural keys
- Enums
db<>fiddle here
Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.
Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.
About the 1-byte "char"
type I used in the demo:
- How to store one-byte integer in PostgreSQL?
1
It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.
– SQLRaptor
Mar 25 at 16:47
Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.
– Erwin Brandstetter
Mar 25 at 16:58
add a comment |
Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.
If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).
If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50)
1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.
The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum
type, which is simple to handle, just as fast and only occupies 4 bytes on disk.
I prepared a quick demo to compare:
Comparing 3 variants for your simple "zodiac" case:
- Surrogate keys
- Natural keys
- Enums
db<>fiddle here
Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.
Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.
About the 1-byte "char"
type I used in the demo:
- How to store one-byte integer in PostgreSQL?
1
It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.
– SQLRaptor
Mar 25 at 16:47
Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.
– Erwin Brandstetter
Mar 25 at 16:58
add a comment |
Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.
If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).
If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50)
1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.
The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum
type, which is simple to handle, just as fast and only occupies 4 bytes on disk.
I prepared a quick demo to compare:
Comparing 3 variants for your simple "zodiac" case:
- Surrogate keys
- Natural keys
- Enums
db<>fiddle here
Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.
Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.
About the 1-byte "char"
type I used in the demo:
- How to store one-byte integer in PostgreSQL?
Natural key, surrogate key? Each has advantages. Depends on the use case. The decision should be based on reason, not on ideology.
If natural keys are getting longer, surrogate keys start to look better. Size matters, especially if resources are limited and tables grow big. Small tables are fast anyway (unless you seriously mess up).
If you have limited RAM (you mentioned 3.75GB in a comment) and many rows in the main table, the sheer size starts to weigh in. Storing varchar(50)
1M times means up to 50+ MB on disk (and a bit more in RAM). If you have 100 columns like that, your rows grow huge and performance suffers. Replace it with a 1, 2 or 4 byte surrogate key to make that 1 - 4 MB per column instead.
The zodiac signs in your example table are short, descriptive and immutable. Perfect candidates for a natural key. Not much to gain with a surrogate key. The immutable nature of zodiac signs and the absence of any additional attributes (columns) also make it a perfect candidate for an enum
type, which is simple to handle, just as fast and only occupies 4 bytes on disk.
I prepared a quick demo to compare:
Comparing 3 variants for your simple "zodiac" case:
- Surrogate keys
- Natural keys
- Enums
db<>fiddle here
Enums seem like the best choice in this particular case. I am not a big fan of enums in general, as they make it harder to add/remove rows and impossible to add more columns to the lookup values.
Most probably there are other problems in your query, but it's too big to represent a question. Would be an extended code audit to go in deep.
About the 1-byte "char"
type I used in the demo:
- How to store one-byte integer in PostgreSQL?
edited Mar 25 at 13:58
answered Mar 25 at 2:30
Erwin BrandstetterErwin Brandstetter
95.3k9185300
95.3k9185300
1
It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.
– SQLRaptor
Mar 25 at 16:47
Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.
– Erwin Brandstetter
Mar 25 at 16:58
add a comment |
1
It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.
– SQLRaptor
Mar 25 at 16:47
Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.
– Erwin Brandstetter
Mar 25 at 16:58
1
1
It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.
– SQLRaptor
Mar 25 at 16:47
It is true that natural keys MIGHT take up more storage, but that is only one aspect of the overall impact they have on performance, and looking just at any one aspect, may lead to wrong conclusions. For example, regarding memory consumption - when using surrogates, forcing a join, memory has to load pages from both tables, consuming significantly more memory space than loading pages from just one table.
– SQLRaptor
Mar 25 at 16:47
Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.
– Erwin Brandstetter
Mar 25 at 16:58
Small lookup tables factor in as a single data page and hardly matter in that respect in connection with big tables. That calculation changes with tables of comparable size / cardinality, but there you typically need to join in any case for additional columns.
– Erwin Brandstetter
Mar 25 at 16:58
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232952%2fjoining-40-normalized-tables-for-200-columns-to-display-detailed-user-profile%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You have 20+ joins to tables that have 1-to-many relationships with your main (profile) table and your WHERE clause seems to not be limiting the result very much. No wonder you get very low performance of this query.
– ypercubeᵀᴹ
Mar 24 at 17:04
1
Note that SQL generates a cartesian product for all child records for 1 to n relations (i.e. all possible combinations of all child records) . So, if child1 has 10 records, child2 4 records and child3 5 records for one main record, you will get a 10 * 4 * 5 = 200 resulting records just for this single main record! And you are joining 40 tables. My goodness!
– Olivier Jacot-Descombes
Mar 24 at 18:22
2
You cannot have reasonable performance without (implicitly or explicitly) declared constraints & indexes. Regardless of whether you want the surrogates, you should be declaring the other constraints--CKs, UNIQUEs, NOT NULLs & FKs--and associated & other relevant indexes where your DBMS doesn't. Depending on your DBMS, you are likely to get the same performance whether or not you keep the surrogates. PS That lookup table should be id-symbol-name, with three CKs. And as @SQLRaptor says, the symbols & names exist to be unique ids--they are surrogate ids--you don't need to add a surrogate id.
– philipxy
Mar 24 at 20:03
1
If you have a main person record
p
speaking languages{l1, l2}
who visited countries{c1, c2, c3}
and you join them, the result will be{ {p, l1, c1}, {p, l1, c2}, {p, l1, c3}, {p, l2, c1}, {p, l2, c2}, {p, l2, c3}, }
. Since SQL does not know which language is related to which country, it simply returns every possible combination of them. If you join this to lookup tables, the number of records does not increase, since now, the returned records are on the n-side of the relation. Make a small experiment, and you will see.– Olivier Jacot-Descombes
Mar 25 at 12:51