ETL - determine deleted entries - Oracle 12c





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I am summarizing my use case in a basic form as below. Any suggestions will be very much appreciated.



ETL:



                      |-> FINAL DB 
SOURCE -> INTERIM DB -|
|-> HISTORY DB



  1. HISTORY DB is updated with the diff in entries between INTERIM and FINAL DB.


  2. FINAL DB is updated using Merge utility of Oracle.


  3. On Day 1, we extract data from SOURCE and put 10 entries in INTERIM DB.


  4. We run a PL/SQL flow which updates FINAL DB and HISTORY DB each with 10 entries.


  5. On Day2, another 10 entries come. But 1 is update, 1 is delete, 1 is insert.


  6. So now, FINAL DB has total 11 entires (earlier 10 + 1 insert) and HISTORY DB has 13 entries(all inserts (earlier 10 + 1 update + 1 delete + 1 insert)).


  7. We use the entries in HISTORY DB with DELETE statuses to delete entries in FINAL DB.


  8. The above premise holds true if we are pulling all the entries from SOURCE.


  9. If we pull only the delta data (the entries that changed), then it gives incorrect deletes. For ex: instead of fetching all 10 (like in step (5)), if I fetch only 2 entries (1 update and 1 insert), there is no way to find DELETES in HISTORY table (as the remaining 8 entries that are not fetched, will also be marked as DELETE in HISTORY table).



Question:

The real scenario involves millions of records and hence we cannot fetch all the data in INTERIM DB. Oracle Merge statement merges the changes into FINAL DB but doesn't do the deletes which we plan to do it by running additional delete statements using the entries in history table having status as 'DELETE'. But the question is how can we have the correct entries in history table, when we have delta data in INTERIM DB?



One solution:

To Fetch full data periodically, to determine the deletes.



Please suggest any other solutions/thoughts. Thanks.










share|improve this question




















  • 2





    The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).

    – APC
    Nov 24 '18 at 7:30











  • Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.

    – learner
    Nov 25 '18 at 5:26













  • agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.

    – Shaun Peterson
    Nov 25 '18 at 21:31











  • Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.

    – learner
    Nov 26 '18 at 14:23


















0















I am summarizing my use case in a basic form as below. Any suggestions will be very much appreciated.



ETL:



                      |-> FINAL DB 
SOURCE -> INTERIM DB -|
|-> HISTORY DB



  1. HISTORY DB is updated with the diff in entries between INTERIM and FINAL DB.


  2. FINAL DB is updated using Merge utility of Oracle.


  3. On Day 1, we extract data from SOURCE and put 10 entries in INTERIM DB.


  4. We run a PL/SQL flow which updates FINAL DB and HISTORY DB each with 10 entries.


  5. On Day2, another 10 entries come. But 1 is update, 1 is delete, 1 is insert.


  6. So now, FINAL DB has total 11 entires (earlier 10 + 1 insert) and HISTORY DB has 13 entries(all inserts (earlier 10 + 1 update + 1 delete + 1 insert)).


  7. We use the entries in HISTORY DB with DELETE statuses to delete entries in FINAL DB.


  8. The above premise holds true if we are pulling all the entries from SOURCE.


  9. If we pull only the delta data (the entries that changed), then it gives incorrect deletes. For ex: instead of fetching all 10 (like in step (5)), if I fetch only 2 entries (1 update and 1 insert), there is no way to find DELETES in HISTORY table (as the remaining 8 entries that are not fetched, will also be marked as DELETE in HISTORY table).



Question:

The real scenario involves millions of records and hence we cannot fetch all the data in INTERIM DB. Oracle Merge statement merges the changes into FINAL DB but doesn't do the deletes which we plan to do it by running additional delete statements using the entries in history table having status as 'DELETE'. But the question is how can we have the correct entries in history table, when we have delta data in INTERIM DB?



One solution:

To Fetch full data periodically, to determine the deletes.



Please suggest any other solutions/thoughts. Thanks.










share|improve this question




















  • 2





    The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).

    – APC
    Nov 24 '18 at 7:30











  • Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.

    – learner
    Nov 25 '18 at 5:26













  • agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.

    – Shaun Peterson
    Nov 25 '18 at 21:31











  • Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.

    – learner
    Nov 26 '18 at 14:23














0












0








0








I am summarizing my use case in a basic form as below. Any suggestions will be very much appreciated.



ETL:



                      |-> FINAL DB 
SOURCE -> INTERIM DB -|
|-> HISTORY DB



  1. HISTORY DB is updated with the diff in entries between INTERIM and FINAL DB.


  2. FINAL DB is updated using Merge utility of Oracle.


  3. On Day 1, we extract data from SOURCE and put 10 entries in INTERIM DB.


  4. We run a PL/SQL flow which updates FINAL DB and HISTORY DB each with 10 entries.


  5. On Day2, another 10 entries come. But 1 is update, 1 is delete, 1 is insert.


  6. So now, FINAL DB has total 11 entires (earlier 10 + 1 insert) and HISTORY DB has 13 entries(all inserts (earlier 10 + 1 update + 1 delete + 1 insert)).


  7. We use the entries in HISTORY DB with DELETE statuses to delete entries in FINAL DB.


  8. The above premise holds true if we are pulling all the entries from SOURCE.


  9. If we pull only the delta data (the entries that changed), then it gives incorrect deletes. For ex: instead of fetching all 10 (like in step (5)), if I fetch only 2 entries (1 update and 1 insert), there is no way to find DELETES in HISTORY table (as the remaining 8 entries that are not fetched, will also be marked as DELETE in HISTORY table).



Question:

The real scenario involves millions of records and hence we cannot fetch all the data in INTERIM DB. Oracle Merge statement merges the changes into FINAL DB but doesn't do the deletes which we plan to do it by running additional delete statements using the entries in history table having status as 'DELETE'. But the question is how can we have the correct entries in history table, when we have delta data in INTERIM DB?



One solution:

To Fetch full data periodically, to determine the deletes.



Please suggest any other solutions/thoughts. Thanks.










share|improve this question
















I am summarizing my use case in a basic form as below. Any suggestions will be very much appreciated.



ETL:



                      |-> FINAL DB 
SOURCE -> INTERIM DB -|
|-> HISTORY DB



  1. HISTORY DB is updated with the diff in entries between INTERIM and FINAL DB.


  2. FINAL DB is updated using Merge utility of Oracle.


  3. On Day 1, we extract data from SOURCE and put 10 entries in INTERIM DB.


  4. We run a PL/SQL flow which updates FINAL DB and HISTORY DB each with 10 entries.


  5. On Day2, another 10 entries come. But 1 is update, 1 is delete, 1 is insert.


  6. So now, FINAL DB has total 11 entires (earlier 10 + 1 insert) and HISTORY DB has 13 entries(all inserts (earlier 10 + 1 update + 1 delete + 1 insert)).


  7. We use the entries in HISTORY DB with DELETE statuses to delete entries in FINAL DB.


  8. The above premise holds true if we are pulling all the entries from SOURCE.


  9. If we pull only the delta data (the entries that changed), then it gives incorrect deletes. For ex: instead of fetching all 10 (like in step (5)), if I fetch only 2 entries (1 update and 1 insert), there is no way to find DELETES in HISTORY table (as the remaining 8 entries that are not fetched, will also be marked as DELETE in HISTORY table).



Question:

The real scenario involves millions of records and hence we cannot fetch all the data in INTERIM DB. Oracle Merge statement merges the changes into FINAL DB but doesn't do the deletes which we plan to do it by running additional delete statements using the entries in history table having status as 'DELETE'. But the question is how can we have the correct entries in history table, when we have delta data in INTERIM DB?



One solution:

To Fetch full data periodically, to determine the deletes.



Please suggest any other solutions/thoughts. Thanks.







sql oracle plsql etl






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 24 '18 at 7:18









APC

121k16120230




121k16120230










asked Nov 23 '18 at 18:52









learnerlearner

5418




5418








  • 2





    The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).

    – APC
    Nov 24 '18 at 7:30











  • Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.

    – learner
    Nov 25 '18 at 5:26













  • agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.

    – Shaun Peterson
    Nov 25 '18 at 21:31











  • Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.

    – learner
    Nov 26 '18 at 14:23














  • 2





    The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).

    – APC
    Nov 24 '18 at 7:30











  • Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.

    – learner
    Nov 25 '18 at 5:26













  • agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.

    – Shaun Peterson
    Nov 25 '18 at 21:31











  • Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.

    – learner
    Nov 26 '18 at 14:23








2




2





The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).

– APC
Nov 24 '18 at 7:30





The problem is that you have hand-rolled a replication mechanism instead of using one of Oracle's built-in methods: replication with Materialized Views, Oracle Streams, CDC or GoldenGate (depending on database version, edition and/or depth of pockets).

– APC
Nov 24 '18 at 7:30













Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.

– learner
Nov 25 '18 at 5:26







Thanks for your response and edits. Any other thoughts/solutions on the above issue will be appreciated.

– learner
Nov 25 '18 at 5:26















agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.

– Shaun Peterson
Nov 25 '18 at 21:31





agree with above comment about hand rolling this, however if you are going to go ahead with it, my solution would be to implement a delete trigger on all your source DB tables, populating a temporary table with the ID of the deleted record and table it was deleted from. This will allow you to just pull the deletes from this table rather than trying to pull full dataset.

– Shaun Peterson
Nov 25 '18 at 21:31













Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.

– learner
Nov 26 '18 at 14:23





Thanks Shaun for responding. Yes, that can be a possible solution but that would involve changing all the sources which may or may not be in our control. Will check further though.

– learner
Nov 26 '18 at 14:23












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451705%2fetl-determine-deleted-entries-oracle-12c%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53451705%2fetl-determine-deleted-entries-oracle-12c%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

RAC Tourist Trophy