ETL - determine deleted entries - Oracle 12c
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am summarizing my use case in a basic form as below. Any suggestions will be very much appreciated.
ETL:
|-> FINAL DB
SOURCE -> INTERIM DB -|
|-> HISTORY DB
HISTORY DB is updated with the diff in entries between INTERIM and FINAL DB.
FINAL DB is updated using Merge utility of Oracle.
On Day 1, we extract data from SOURCE and put 10 entries in INTERIM DB.
We run a PL/SQL flow which updates FINAL DB and HISTORY DB each with 10 entries.
On Day2, another 10 entries come. But 1 is update, 1 is delete, 1 is insert.
So now, FINAL DB has total 11 entires (earlier 10 + 1 insert) and HISTORY DB has 13 entries(all inserts (earlier 10 + 1 update + 1 delete + 1 insert)).
We use the entries in HISTORY DB with DELETE statuses to delete entries in FINAL DB.
The above premise holds true if we are pulling all the entries from SOURCE.
If we pull only the delta data (the entries that changed), then it gives incorrect deletes. For ex: instead of fetching all 10 (like in step (5)), if I fetch only 2 entries (1 update and 1 insert), there is no way to find DELETES in HISTORY table (as the remaining 8 entries that are not fetched, will also be marked as DELETE in HISTORY table).
Question:
The real scenario involves millions of records and hence we cannot fetch all the data in INTERIM DB. Oracle Merge
statement merges the changes into FINAL DB but doesn't do the deletes which we plan to do it by running additional delete statements using the entries in history table having status as 'DELETE'. But the question is how can we have the correct entries in history table, when we have delta data in INTERIM DB?
One solution:
To Fetch full data periodically, to determine the deletes.
Please suggest any other solutions/thoughts. Thanks.
sql oracle plsql etl
add a comment |
I am summarizing my use case in a basic form as below. Any suggestions will be very much appreciated.
ETL:
|-> FINAL DB
SOURCE -> INTERIM DB -|
|-> HISTORY DB
HISTORY DB is updated with the diff in entries between INTERIM and FINAL DB.
FINAL DB is updated using Merge utility of Oracle.
On Day 1, we extract data from SOURCE and put 10 entries in INTERIM DB.
We run a PL/SQL flow which updates FINAL DB and HISTORY DB each with 10 entries.
On Day2, another 10 entries come. But 1 is update, 1 is delete, 1 is insert.
So now, FINAL DB has total 11 entires (earlier 10 + 1 insert) and HISTORY DB has 13 entries(all inserts (earlier 10 + 1 update + 1 delete + 1 insert)).
We use the entries in HISTORY DB with DELETE statuses to delete entries in FINAL DB.
The above premise holds true if we are pulling all the entries from SOURCE.
If we pull only the delta data (the entries that changed), then it gives incorrect deletes. For ex: instead of fetching all 10 (like in step (5)), if I fetch only 2 entries (1 update and 1 insert), there is no way to find DELETES in HISTORY table (as the remaining 8 entries that are not fetched, will also be marked as DELETE in HISTORY table).
Question:
The real scenario involves millions of records and hence we cannot fetch all the data in INTERIM DB. Oracle Merge
statement merges the changes into FINAL DB but doesn't do the deletes which we plan to do it by running additional delete statements using the entries in history table having status as 'DELETE'. But the question is how can we have the correct entries in history table, when we have delta data in INTERIM DB?
One solution:
To Fetch full data periodically, to determine the deletes.
Please suggest any other solutions/thoughts. Thanks.
sql oracle plsql etl
2
The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).
– APC
Nov 24 '18 at 7:30
Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.
– learner
Nov 25 '18 at 5:26
agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.
– Shaun Peterson
Nov 25 '18 at 21:31
Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.
– learner
Nov 26 '18 at 14:23
add a comment |
I am summarizing my use case in a basic form as below. Any suggestions will be very much appreciated.
ETL:
|-> FINAL DB
SOURCE -> INTERIM DB -|
|-> HISTORY DB
HISTORY DB is updated with the diff in entries between INTERIM and FINAL DB.
FINAL DB is updated using Merge utility of Oracle.
On Day 1, we extract data from SOURCE and put 10 entries in INTERIM DB.
We run a PL/SQL flow which updates FINAL DB and HISTORY DB each with 10 entries.
On Day2, another 10 entries come. But 1 is update, 1 is delete, 1 is insert.
So now, FINAL DB has total 11 entires (earlier 10 + 1 insert) and HISTORY DB has 13 entries(all inserts (earlier 10 + 1 update + 1 delete + 1 insert)).
We use the entries in HISTORY DB with DELETE statuses to delete entries in FINAL DB.
The above premise holds true if we are pulling all the entries from SOURCE.
If we pull only the delta data (the entries that changed), then it gives incorrect deletes. For ex: instead of fetching all 10 (like in step (5)), if I fetch only 2 entries (1 update and 1 insert), there is no way to find DELETES in HISTORY table (as the remaining 8 entries that are not fetched, will also be marked as DELETE in HISTORY table).
Question:
The real scenario involves millions of records and hence we cannot fetch all the data in INTERIM DB. Oracle Merge
statement merges the changes into FINAL DB but doesn't do the deletes which we plan to do it by running additional delete statements using the entries in history table having status as 'DELETE'. But the question is how can we have the correct entries in history table, when we have delta data in INTERIM DB?
One solution:
To Fetch full data periodically, to determine the deletes.
Please suggest any other solutions/thoughts. Thanks.
sql oracle plsql etl
I am summarizing my use case in a basic form as below. Any suggestions will be very much appreciated.
ETL:
|-> FINAL DB
SOURCE -> INTERIM DB -|
|-> HISTORY DB
HISTORY DB is updated with the diff in entries between INTERIM and FINAL DB.
FINAL DB is updated using Merge utility of Oracle.
On Day 1, we extract data from SOURCE and put 10 entries in INTERIM DB.
We run a PL/SQL flow which updates FINAL DB and HISTORY DB each with 10 entries.
On Day2, another 10 entries come. But 1 is update, 1 is delete, 1 is insert.
So now, FINAL DB has total 11 entires (earlier 10 + 1 insert) and HISTORY DB has 13 entries(all inserts (earlier 10 + 1 update + 1 delete + 1 insert)).
We use the entries in HISTORY DB with DELETE statuses to delete entries in FINAL DB.
The above premise holds true if we are pulling all the entries from SOURCE.
If we pull only the delta data (the entries that changed), then it gives incorrect deletes. For ex: instead of fetching all 10 (like in step (5)), if I fetch only 2 entries (1 update and 1 insert), there is no way to find DELETES in HISTORY table (as the remaining 8 entries that are not fetched, will also be marked as DELETE in HISTORY table).
Question:
The real scenario involves millions of records and hence we cannot fetch all the data in INTERIM DB. Oracle Merge
statement merges the changes into FINAL DB but doesn't do the deletes which we plan to do it by running additional delete statements using the entries in history table having status as 'DELETE'. But the question is how can we have the correct entries in history table, when we have delta data in INTERIM DB?
One solution:
To Fetch full data periodically, to determine the deletes.
Please suggest any other solutions/thoughts. Thanks.
sql oracle plsql etl
sql oracle plsql etl
edited Nov 24 '18 at 7:18
APC
121k16120230
121k16120230
asked Nov 23 '18 at 18:52
learnerlearner
5418
5418
2
The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).
– APC
Nov 24 '18 at 7:30
Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.
– learner
Nov 25 '18 at 5:26
agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.
– Shaun Peterson
Nov 25 '18 at 21:31
Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.
– learner
Nov 26 '18 at 14:23
add a comment |
2
The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).
– APC
Nov 24 '18 at 7:30
Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.
– learner
Nov 25 '18 at 5:26
agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.
– Shaun Peterson
Nov 25 '18 at 21:31
Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.
– learner
Nov 26 '18 at 14:23
2
2
The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).
– APC
Nov 24 '18 at 7:30
The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).
– APC
Nov 24 '18 at 7:30
Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.
– learner
Nov 25 '18 at 5:26
Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.
– learner
Nov 25 '18 at 5:26
agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.
– Shaun Peterson
Nov 25 '18 at 21:31
agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.
– Shaun Peterson
Nov 25 '18 at 21:31
Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.
– learner
Nov 26 '18 at 14:23
Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.
– learner
Nov 26 '18 at 14:23
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451705%2fetl-determine-deleted-entries-oracle-12c%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451705%2fetl-determine-deleted-entries-oracle-12c%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).
– APC
Nov 24 '18 at 7:30
Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.
– learner
Nov 25 '18 at 5:26
agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.
– Shaun Peterson
Nov 25 '18 at 21:31
Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.
– learner
Nov 26 '18 at 14:23