Versioning Database Persisted Objects, How would you?
(Not related to versioning the database schema)
Applications that interfaces with databases often have domain objects that are composed with data from many tables. Suppose the application were to support versioning, in the sense of CVS, for these domain objects.
For some arbitry domain object, how would you design a database schema to handle this requirement? Any experience to share?
database database-design versioning
add a comment |
(Not related to versioning the database schema)
Applications that interfaces with databases often have domain objects that are composed with data from many tables. Suppose the application were to support versioning, in the sense of CVS, for these domain objects.
For some arbitry domain object, how would you design a database schema to handle this requirement? Any experience to share?
database database-design versioning
See this existing question
– Matthew Watson
Sep 24 '08 at 8:01
This is actually quite a good question and I'm guessing that since you've accepted my answer a penny must have dropped. IMO this ad-hoc history tracking is one of the cardinal architectural sins of Insurance systems (an area that I work in). +1
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:31
1
I accepted your answer because the links you provided gave me the knowledge I needed to make an informed decision. Being given enough knowledge to answer the question yourself is a superior way of learning, than just being spoon-fed generic answers.
– Chris Vest
Sep 25 '08 at 16:25
Glad to be of help - I've found that doing a bit of web research and including the fan-out is a good sweet spot for stackoverflow answers. Many of my better (and more popular) answers have been done in this format, and it seems to be appreciated.
– ConcernedOfTunbridgeWells
Oct 8 '08 at 10:22
add a comment |
(Not related to versioning the database schema)
Applications that interfaces with databases often have domain objects that are composed with data from many tables. Suppose the application were to support versioning, in the sense of CVS, for these domain objects.
For some arbitry domain object, how would you design a database schema to handle this requirement? Any experience to share?
database database-design versioning
(Not related to versioning the database schema)
Applications that interfaces with databases often have domain objects that are composed with data from many tables. Suppose the application were to support versioning, in the sense of CVS, for these domain objects.
For some arbitry domain object, how would you design a database schema to handle this requirement? Any experience to share?
database database-design versioning
database database-design versioning
edited Sep 24 '08 at 10:04
David Schmitt
43.7k24105155
43.7k24105155
asked Sep 24 '08 at 7:41
Chris VestChris Vest
7,46222842
7,46222842
See this existing question
– Matthew Watson
Sep 24 '08 at 8:01
This is actually quite a good question and I'm guessing that since you've accepted my answer a penny must have dropped. IMO this ad-hoc history tracking is one of the cardinal architectural sins of Insurance systems (an area that I work in). +1
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:31
1
I accepted your answer because the links you provided gave me the knowledge I needed to make an informed decision. Being given enough knowledge to answer the question yourself is a superior way of learning, than just being spoon-fed generic answers.
– Chris Vest
Sep 25 '08 at 16:25
Glad to be of help - I've found that doing a bit of web research and including the fan-out is a good sweet spot for stackoverflow answers. Many of my better (and more popular) answers have been done in this format, and it seems to be appreciated.
– ConcernedOfTunbridgeWells
Oct 8 '08 at 10:22
add a comment |
See this existing question
– Matthew Watson
Sep 24 '08 at 8:01
This is actually quite a good question and I'm guessing that since you've accepted my answer a penny must have dropped. IMO this ad-hoc history tracking is one of the cardinal architectural sins of Insurance systems (an area that I work in). +1
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:31
1
I accepted your answer because the links you provided gave me the knowledge I needed to make an informed decision. Being given enough knowledge to answer the question yourself is a superior way of learning, than just being spoon-fed generic answers.
– Chris Vest
Sep 25 '08 at 16:25
Glad to be of help - I've found that doing a bit of web research and including the fan-out is a good sweet spot for stackoverflow answers. Many of my better (and more popular) answers have been done in this format, and it seems to be appreciated.
– ConcernedOfTunbridgeWells
Oct 8 '08 at 10:22
See this existing question
– Matthew Watson
Sep 24 '08 at 8:01
See this existing question
– Matthew Watson
Sep 24 '08 at 8:01
This is actually quite a good question and I'm guessing that since you've accepted my answer a penny must have dropped. IMO this ad-hoc history tracking is one of the cardinal architectural sins of Insurance systems (an area that I work in). +1
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:31
This is actually quite a good question and I'm guessing that since you've accepted my answer a penny must have dropped. IMO this ad-hoc history tracking is one of the cardinal architectural sins of Insurance systems (an area that I work in). +1
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:31
1
1
I accepted your answer because the links you provided gave me the knowledge I needed to make an informed decision. Being given enough knowledge to answer the question yourself is a superior way of learning, than just being spoon-fed generic answers.
– Chris Vest
Sep 25 '08 at 16:25
I accepted your answer because the links you provided gave me the knowledge I needed to make an informed decision. Being given enough knowledge to answer the question yourself is a superior way of learning, than just being spoon-fed generic answers.
– Chris Vest
Sep 25 '08 at 16:25
Glad to be of help - I've found that doing a bit of web research and including the fan-out is a good sweet spot for stackoverflow answers. Many of my better (and more popular) answers have been done in this format, and it seems to be appreciated.
– ConcernedOfTunbridgeWells
Oct 8 '08 at 10:22
Glad to be of help - I've found that doing a bit of web research and including the fan-out is a good sweet spot for stackoverflow answers. Many of my better (and more popular) answers have been done in this format, and it seems to be appreciated.
– ConcernedOfTunbridgeWells
Oct 8 '08 at 10:22
add a comment |
9 Answers
9
active
oldest
votes
Think carefully about the requirements for revisions. Once your code-base has pervasive history tracking built into the operational system it will get very complex. Insurance underwriting systems are particularly bad for this, with schemas often running in excess of 1000 tables. Queries also tend to be quite complex and this can lead to performance issues.
If the historical state is really only required for reporting, consider implementing a 'current state' transactional system with a data warehouse structure hanging off the back for tracking history. Slowly Changing Dimensions are a much simpler structure for tracking historical state than trying to embed an ad-hoc history tracking mechanism directly into your operational system.
Also, Changed Data Capture is simpler for a 'current state' system with changes being done to the records in place - the primary keys of the records don't change so you don't have to match records holding different versions of the same entity together. An effective CDC mechanism will make an incremental warehouse load process fairly lightweight and possible to run quite frequently. If you don't need up-to-the minute tracking of historical state (almost, but not quite, and oxymoron) this can be an effective solution with a much simpler code base than a full history tracking mechanism built directly into the application.
Just read the SCD page. Lots of food for thought. The alternative Type 6 looks like it is hitting close to home for me.
– Chris Vest
Sep 24 '08 at 8:58
Type 2/6 (a type 6 is just a type 2 with a self join to the most recent version) dimensions are probably what you want here. Look into using changed data capture if you want to update the warehouse frequently.
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:00
I'd like to say that although this was two years ago (almost to the day), this answer was very useful! Thank you!
– corsiKa
Sep 22 '10 at 19:59
add a comment |
A technique I've used for this in that past has been to have a concept of "generations" in the database, each change increments the current generation number for the database - if you use subversion, think revisions.
Each record has 2 generation numbers associated with it (2 extra columns on the tables) - the generation that the record starts being valid for, and the generation the it stops being valid for. If the data is currently valid, the second number would be NULL or some other generic marker.
So to insert into the database:
- increment the generation number
- insert the data
- tag the lifetime of that data with valid from, and a valid to of NULL
If you're updating some data:
- mark all data that's about to be modified as valid to the current generation number
- increment the generation number
- insert the new data with the current generation number
deleting is just a matter of marking the data as terminating at the current generation.
To get a particular version of the data, find what generation you're after and look for data valid between those generation versions.
Example:
Create a person.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |NULL|
Update tel no.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |NULL|
Delete fred:
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |2 |
+1. I was thinking along these lines but was unsure; it's very reassuring to see that someone has used this technique successfully.
– Jason S
May 6 '09 at 13:51
add a comment |
An alternative to strict versioning is to split the data into 2 tables: current and history.
The current table has all the live data and has the benefits of all the performance that you build in.
Any changes first write the current data into the associated "history" table along with a date marker which says when it changed.
add a comment |
If you are using Hibernate JBoss Envers could be an option. You only have to annotate classes with @Audited
to keep their history.
add a comment |
You'll need a master record in a master table that contains the information common among all versions.
Then each child table uses master record id + version no as part of the primary key.
It can be done without the master table, but in my experience it will tend to make the SQL statements a lot messier.
add a comment |
A simple fool-proof way, is to add a version column to your tables and store the Object's version and choose the appropriate application logic based on that version number.
This way you also get backwards compatibility for little cost. Which is always good
add a comment |
ZoDB + ZEO implements a revision based database with complete rollback to any point in time support. Go check it.
Bad Part: It's Zope tied.
add a comment |
Once an object is saved in a database, we can modify that object any number of times right, If we want to know how many no of times that an object is modified then we need to apply this versioning concept.
When ever we use versioning then hibernate inserts version number as zero, when ever object is saved for the first time in the database. Later hibernate increments that version no by one automatically when ever a modification is done on that particular object.
In order to use this versioning concept, we need the following two changes in our application
Add one property of type int in our pojo class.
In hibernate mapping file, add an element called version soon after id element
add a comment |
I'm not sure if we have the same problem, but I required a large number of 'proposed' changes to the current data set (with chained proposals, ie, proposal on proposal).
Think branching in source control but for database tables.
We also wanted a historical log but this was the least important factor - the main issue was managing change proposals which could hang around for 6 months or longer as the business mulled over change approval and got ready for the actual change to be implemented.
The idea is that users can load up a Change and start creating, editing, deleting the current state of data without actually applying those changes. Revert any changes they may have made, or cancel the entire change.
The only way I have been able to achieve this is to have a set of common fields on my versioned tables:
Root ID: Required - set once to the primary key when the first version of a record is created. This represents the primary key across all of time and is copied into each version of the record. You should consider the Root ID when naming relation columns (eg. PARENT_ROOT_ID instead of PARENT_ID). As the Root ID is also the primary key of the initial version, foreign keys can be created against the actual primary key - the actual desired row will be determined by the version filters defined below.
Change ID: Required - every record is created, updated, deleted via a change
Copied From ID: Nullable - null indicates newly created record, not-null indicates which record ID this row was cloned from when updated
Effective From Date/Time: Nullable - null indicates proposed record, not-null indicates when the record became current. Unfortunately a unique index cannot be placed on Root ID/Effective From as there can be multiple null values for any Root ID. (Unless you want to restrict yourself to a single proposed change per record)
Effective To Date/Time: Nullable - null indicates current/proposed, not-null indicates when it became historical. Not technically required but helps speed up queries finding the current data. This field could be corrupted by hand-edits but can be rebuilt from the Effective From Date/Time if this occurs.
Delete Flag: Boolean - set to true when it is proposed that the record be deleted upon becoming current. When deletes are committed, their Effective To Date/Time is set to the same value as the Effective From Date/Time, filtering them out of the current data set.
The query to get the current state of data according to a change would be;
SELECT * FROM table WHERE (CHANGE_ID IN :ChangeId OR (EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now) AND ROOT_ID NOT IN (SELECT ROOT_ID FROM table WHERE CHANGE_ID IN :ChangeId)))
(The filtering of change-on-change multiples is done outside of this query).
The query to get the current state of data at a point in time would be;
SELECT * FROM table WHERE EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now)
Common indexes created on (ROOT_ID, EFFECTIVE_FROM), (EFFECTIVE_FROM, EFFECTIVE_TO) and (CHANGE_ID).
If anyone knows a better solution I would love to hear about it.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f125877%2fversioning-database-persisted-objects-how-would-you%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
9 Answers
9
active
oldest
votes
9 Answers
9
active
oldest
votes
active
oldest
votes
active
oldest
votes
Think carefully about the requirements for revisions. Once your code-base has pervasive history tracking built into the operational system it will get very complex. Insurance underwriting systems are particularly bad for this, with schemas often running in excess of 1000 tables. Queries also tend to be quite complex and this can lead to performance issues.
If the historical state is really only required for reporting, consider implementing a 'current state' transactional system with a data warehouse structure hanging off the back for tracking history. Slowly Changing Dimensions are a much simpler structure for tracking historical state than trying to embed an ad-hoc history tracking mechanism directly into your operational system.
Also, Changed Data Capture is simpler for a 'current state' system with changes being done to the records in place - the primary keys of the records don't change so you don't have to match records holding different versions of the same entity together. An effective CDC mechanism will make an incremental warehouse load process fairly lightweight and possible to run quite frequently. If you don't need up-to-the minute tracking of historical state (almost, but not quite, and oxymoron) this can be an effective solution with a much simpler code base than a full history tracking mechanism built directly into the application.
Just read the SCD page. Lots of food for thought. The alternative Type 6 looks like it is hitting close to home for me.
– Chris Vest
Sep 24 '08 at 8:58
Type 2/6 (a type 6 is just a type 2 with a self join to the most recent version) dimensions are probably what you want here. Look into using changed data capture if you want to update the warehouse frequently.
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:00
I'd like to say that although this was two years ago (almost to the day), this answer was very useful! Thank you!
– corsiKa
Sep 22 '10 at 19:59
add a comment |
Think carefully about the requirements for revisions. Once your code-base has pervasive history tracking built into the operational system it will get very complex. Insurance underwriting systems are particularly bad for this, with schemas often running in excess of 1000 tables. Queries also tend to be quite complex and this can lead to performance issues.
If the historical state is really only required for reporting, consider implementing a 'current state' transactional system with a data warehouse structure hanging off the back for tracking history. Slowly Changing Dimensions are a much simpler structure for tracking historical state than trying to embed an ad-hoc history tracking mechanism directly into your operational system.
Also, Changed Data Capture is simpler for a 'current state' system with changes being done to the records in place - the primary keys of the records don't change so you don't have to match records holding different versions of the same entity together. An effective CDC mechanism will make an incremental warehouse load process fairly lightweight and possible to run quite frequently. If you don't need up-to-the minute tracking of historical state (almost, but not quite, and oxymoron) this can be an effective solution with a much simpler code base than a full history tracking mechanism built directly into the application.
Just read the SCD page. Lots of food for thought. The alternative Type 6 looks like it is hitting close to home for me.
– Chris Vest
Sep 24 '08 at 8:58
Type 2/6 (a type 6 is just a type 2 with a self join to the most recent version) dimensions are probably what you want here. Look into using changed data capture if you want to update the warehouse frequently.
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:00
I'd like to say that although this was two years ago (almost to the day), this answer was very useful! Thank you!
– corsiKa
Sep 22 '10 at 19:59
add a comment |
Think carefully about the requirements for revisions. Once your code-base has pervasive history tracking built into the operational system it will get very complex. Insurance underwriting systems are particularly bad for this, with schemas often running in excess of 1000 tables. Queries also tend to be quite complex and this can lead to performance issues.
If the historical state is really only required for reporting, consider implementing a 'current state' transactional system with a data warehouse structure hanging off the back for tracking history. Slowly Changing Dimensions are a much simpler structure for tracking historical state than trying to embed an ad-hoc history tracking mechanism directly into your operational system.
Also, Changed Data Capture is simpler for a 'current state' system with changes being done to the records in place - the primary keys of the records don't change so you don't have to match records holding different versions of the same entity together. An effective CDC mechanism will make an incremental warehouse load process fairly lightweight and possible to run quite frequently. If you don't need up-to-the minute tracking of historical state (almost, but not quite, and oxymoron) this can be an effective solution with a much simpler code base than a full history tracking mechanism built directly into the application.
Think carefully about the requirements for revisions. Once your code-base has pervasive history tracking built into the operational system it will get very complex. Insurance underwriting systems are particularly bad for this, with schemas often running in excess of 1000 tables. Queries also tend to be quite complex and this can lead to performance issues.
If the historical state is really only required for reporting, consider implementing a 'current state' transactional system with a data warehouse structure hanging off the back for tracking history. Slowly Changing Dimensions are a much simpler structure for tracking historical state than trying to embed an ad-hoc history tracking mechanism directly into your operational system.
Also, Changed Data Capture is simpler for a 'current state' system with changes being done to the records in place - the primary keys of the records don't change so you don't have to match records holding different versions of the same entity together. An effective CDC mechanism will make an incremental warehouse load process fairly lightweight and possible to run quite frequently. If you don't need up-to-the minute tracking of historical state (almost, but not quite, and oxymoron) this can be an effective solution with a much simpler code base than a full history tracking mechanism built directly into the application.
edited Jan 21 '13 at 10:16
answered Sep 24 '08 at 8:29
ConcernedOfTunbridgeWellsConcernedOfTunbridgeWells
52.1k13124186
52.1k13124186
Just read the SCD page. Lots of food for thought. The alternative Type 6 looks like it is hitting close to home for me.
– Chris Vest
Sep 24 '08 at 8:58
Type 2/6 (a type 6 is just a type 2 with a self join to the most recent version) dimensions are probably what you want here. Look into using changed data capture if you want to update the warehouse frequently.
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:00
I'd like to say that although this was two years ago (almost to the day), this answer was very useful! Thank you!
– corsiKa
Sep 22 '10 at 19:59
add a comment |
Just read the SCD page. Lots of food for thought. The alternative Type 6 looks like it is hitting close to home for me.
– Chris Vest
Sep 24 '08 at 8:58
Type 2/6 (a type 6 is just a type 2 with a self join to the most recent version) dimensions are probably what you want here. Look into using changed data capture if you want to update the warehouse frequently.
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:00
I'd like to say that although this was two years ago (almost to the day), this answer was very useful! Thank you!
– corsiKa
Sep 22 '10 at 19:59
Just read the SCD page. Lots of food for thought. The alternative Type 6 looks like it is hitting close to home for me.
– Chris Vest
Sep 24 '08 at 8:58
Just read the SCD page. Lots of food for thought. The alternative Type 6 looks like it is hitting close to home for me.
– Chris Vest
Sep 24 '08 at 8:58
Type 2/6 (a type 6 is just a type 2 with a self join to the most recent version) dimensions are probably what you want here. Look into using changed data capture if you want to update the warehouse frequently.
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:00
Type 2/6 (a type 6 is just a type 2 with a self join to the most recent version) dimensions are probably what you want here. Look into using changed data capture if you want to update the warehouse frequently.
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:00
I'd like to say that although this was two years ago (almost to the day), this answer was very useful! Thank you!
– corsiKa
Sep 22 '10 at 19:59
I'd like to say that although this was two years ago (almost to the day), this answer was very useful! Thank you!
– corsiKa
Sep 22 '10 at 19:59
add a comment |
A technique I've used for this in that past has been to have a concept of "generations" in the database, each change increments the current generation number for the database - if you use subversion, think revisions.
Each record has 2 generation numbers associated with it (2 extra columns on the tables) - the generation that the record starts being valid for, and the generation the it stops being valid for. If the data is currently valid, the second number would be NULL or some other generic marker.
So to insert into the database:
- increment the generation number
- insert the data
- tag the lifetime of that data with valid from, and a valid to of NULL
If you're updating some data:
- mark all data that's about to be modified as valid to the current generation number
- increment the generation number
- insert the new data with the current generation number
deleting is just a matter of marking the data as terminating at the current generation.
To get a particular version of the data, find what generation you're after and look for data valid between those generation versions.
Example:
Create a person.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |NULL|
Update tel no.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |NULL|
Delete fred:
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |2 |
+1. I was thinking along these lines but was unsure; it's very reassuring to see that someone has used this technique successfully.
– Jason S
May 6 '09 at 13:51
add a comment |
A technique I've used for this in that past has been to have a concept of "generations" in the database, each change increments the current generation number for the database - if you use subversion, think revisions.
Each record has 2 generation numbers associated with it (2 extra columns on the tables) - the generation that the record starts being valid for, and the generation the it stops being valid for. If the data is currently valid, the second number would be NULL or some other generic marker.
So to insert into the database:
- increment the generation number
- insert the data
- tag the lifetime of that data with valid from, and a valid to of NULL
If you're updating some data:
- mark all data that's about to be modified as valid to the current generation number
- increment the generation number
- insert the new data with the current generation number
deleting is just a matter of marking the data as terminating at the current generation.
To get a particular version of the data, find what generation you're after and look for data valid between those generation versions.
Example:
Create a person.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |NULL|
Update tel no.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |NULL|
Delete fred:
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |2 |
+1. I was thinking along these lines but was unsure; it's very reassuring to see that someone has used this technique successfully.
– Jason S
May 6 '09 at 13:51
add a comment |
A technique I've used for this in that past has been to have a concept of "generations" in the database, each change increments the current generation number for the database - if you use subversion, think revisions.
Each record has 2 generation numbers associated with it (2 extra columns on the tables) - the generation that the record starts being valid for, and the generation the it stops being valid for. If the data is currently valid, the second number would be NULL or some other generic marker.
So to insert into the database:
- increment the generation number
- insert the data
- tag the lifetime of that data with valid from, and a valid to of NULL
If you're updating some data:
- mark all data that's about to be modified as valid to the current generation number
- increment the generation number
- insert the new data with the current generation number
deleting is just a matter of marking the data as terminating at the current generation.
To get a particular version of the data, find what generation you're after and look for data valid between those generation versions.
Example:
Create a person.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |NULL|
Update tel no.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |NULL|
Delete fred:
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |2 |
A technique I've used for this in that past has been to have a concept of "generations" in the database, each change increments the current generation number for the database - if you use subversion, think revisions.
Each record has 2 generation numbers associated with it (2 extra columns on the tables) - the generation that the record starts being valid for, and the generation the it stops being valid for. If the data is currently valid, the second number would be NULL or some other generic marker.
So to insert into the database:
- increment the generation number
- insert the data
- tag the lifetime of that data with valid from, and a valid to of NULL
If you're updating some data:
- mark all data that's about to be modified as valid to the current generation number
- increment the generation number
- insert the new data with the current generation number
deleting is just a matter of marking the data as terminating at the current generation.
To get a particular version of the data, find what generation you're after and look for data valid between those generation versions.
Example:
Create a person.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |NULL|
Update tel no.
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |NULL|
Delete fred:
|Name|D.O.B |Telephone|From|To |
|Fred|1 april|555-29384|1 |1 |
|Fred|1 april|555-43534|2 |2 |
edited Sep 24 '08 at 7:56
answered Sep 24 '08 at 7:49
Jim TJim T
10.6k52240
10.6k52240
+1. I was thinking along these lines but was unsure; it's very reassuring to see that someone has used this technique successfully.
– Jason S
May 6 '09 at 13:51
add a comment |
+1. I was thinking along these lines but was unsure; it's very reassuring to see that someone has used this technique successfully.
– Jason S
May 6 '09 at 13:51
+1. I was thinking along these lines but was unsure; it's very reassuring to see that someone has used this technique successfully.
– Jason S
May 6 '09 at 13:51
+1. I was thinking along these lines but was unsure; it's very reassuring to see that someone has used this technique successfully.
– Jason S
May 6 '09 at 13:51
add a comment |
An alternative to strict versioning is to split the data into 2 tables: current and history.
The current table has all the live data and has the benefits of all the performance that you build in.
Any changes first write the current data into the associated "history" table along with a date marker which says when it changed.
add a comment |
An alternative to strict versioning is to split the data into 2 tables: current and history.
The current table has all the live data and has the benefits of all the performance that you build in.
Any changes first write the current data into the associated "history" table along with a date marker which says when it changed.
add a comment |
An alternative to strict versioning is to split the data into 2 tables: current and history.
The current table has all the live data and has the benefits of all the performance that you build in.
Any changes first write the current data into the associated "history" table along with a date marker which says when it changed.
An alternative to strict versioning is to split the data into 2 tables: current and history.
The current table has all the live data and has the benefits of all the performance that you build in.
Any changes first write the current data into the associated "history" table along with a date marker which says when it changed.
answered Sep 24 '08 at 7:59
Jim TJim T
10.6k52240
10.6k52240
add a comment |
add a comment |
If you are using Hibernate JBoss Envers could be an option. You only have to annotate classes with @Audited
to keep their history.
add a comment |
If you are using Hibernate JBoss Envers could be an option. You only have to annotate classes with @Audited
to keep their history.
add a comment |
If you are using Hibernate JBoss Envers could be an option. You only have to annotate classes with @Audited
to keep their history.
If you are using Hibernate JBoss Envers could be an option. You only have to annotate classes with @Audited
to keep their history.
answered Aug 12 '10 at 8:11
deamondeamon
37.2k86243369
37.2k86243369
add a comment |
add a comment |
You'll need a master record in a master table that contains the information common among all versions.
Then each child table uses master record id + version no as part of the primary key.
It can be done without the master table, but in my experience it will tend to make the SQL statements a lot messier.
add a comment |
You'll need a master record in a master table that contains the information common among all versions.
Then each child table uses master record id + version no as part of the primary key.
It can be done without the master table, but in my experience it will tend to make the SQL statements a lot messier.
add a comment |
You'll need a master record in a master table that contains the information common among all versions.
Then each child table uses master record id + version no as part of the primary key.
It can be done without the master table, but in my experience it will tend to make the SQL statements a lot messier.
You'll need a master record in a master table that contains the information common among all versions.
Then each child table uses master record id + version no as part of the primary key.
It can be done without the master table, but in my experience it will tend to make the SQL statements a lot messier.
answered Sep 24 '08 at 7:57
Roy TangRoy Tang
3,08173465
3,08173465
add a comment |
add a comment |
A simple fool-proof way, is to add a version column to your tables and store the Object's version and choose the appropriate application logic based on that version number.
This way you also get backwards compatibility for little cost. Which is always good
add a comment |
A simple fool-proof way, is to add a version column to your tables and store the Object's version and choose the appropriate application logic based on that version number.
This way you also get backwards compatibility for little cost. Which is always good
add a comment |
A simple fool-proof way, is to add a version column to your tables and store the Object's version and choose the appropriate application logic based on that version number.
This way you also get backwards compatibility for little cost. Which is always good
A simple fool-proof way, is to add a version column to your tables and store the Object's version and choose the appropriate application logic based on that version number.
This way you also get backwards compatibility for little cost. Which is always good
answered Sep 24 '08 at 8:31
Robert GouldRobert Gould
41.8k53168259
41.8k53168259
add a comment |
add a comment |
ZoDB + ZEO implements a revision based database with complete rollback to any point in time support. Go check it.
Bad Part: It's Zope tied.
add a comment |
ZoDB + ZEO implements a revision based database with complete rollback to any point in time support. Go check it.
Bad Part: It's Zope tied.
add a comment |
ZoDB + ZEO implements a revision based database with complete rollback to any point in time support. Go check it.
Bad Part: It's Zope tied.
ZoDB + ZEO implements a revision based database with complete rollback to any point in time support. Go check it.
Bad Part: It's Zope tied.
answered Sep 24 '08 at 8:32
lmslms
52332
52332
add a comment |
add a comment |
Once an object is saved in a database, we can modify that object any number of times right, If we want to know how many no of times that an object is modified then we need to apply this versioning concept.
When ever we use versioning then hibernate inserts version number as zero, when ever object is saved for the first time in the database. Later hibernate increments that version no by one automatically when ever a modification is done on that particular object.
In order to use this versioning concept, we need the following two changes in our application
Add one property of type int in our pojo class.
In hibernate mapping file, add an element called version soon after id element
add a comment |
Once an object is saved in a database, we can modify that object any number of times right, If we want to know how many no of times that an object is modified then we need to apply this versioning concept.
When ever we use versioning then hibernate inserts version number as zero, when ever object is saved for the first time in the database. Later hibernate increments that version no by one automatically when ever a modification is done on that particular object.
In order to use this versioning concept, we need the following two changes in our application
Add one property of type int in our pojo class.
In hibernate mapping file, add an element called version soon after id element
add a comment |
Once an object is saved in a database, we can modify that object any number of times right, If we want to know how many no of times that an object is modified then we need to apply this versioning concept.
When ever we use versioning then hibernate inserts version number as zero, when ever object is saved for the first time in the database. Later hibernate increments that version no by one automatically when ever a modification is done on that particular object.
In order to use this versioning concept, we need the following two changes in our application
Add one property of type int in our pojo class.
In hibernate mapping file, add an element called version soon after id element
Once an object is saved in a database, we can modify that object any number of times right, If we want to know how many no of times that an object is modified then we need to apply this versioning concept.
When ever we use versioning then hibernate inserts version number as zero, when ever object is saved for the first time in the database. Later hibernate increments that version no by one automatically when ever a modification is done on that particular object.
In order to use this versioning concept, we need the following two changes in our application
Add one property of type int in our pojo class.
In hibernate mapping file, add an element called version soon after id element
answered Jun 3 '15 at 10:29
CredittoCreditto
1,0421817
1,0421817
add a comment |
add a comment |
I'm not sure if we have the same problem, but I required a large number of 'proposed' changes to the current data set (with chained proposals, ie, proposal on proposal).
Think branching in source control but for database tables.
We also wanted a historical log but this was the least important factor - the main issue was managing change proposals which could hang around for 6 months or longer as the business mulled over change approval and got ready for the actual change to be implemented.
The idea is that users can load up a Change and start creating, editing, deleting the current state of data without actually applying those changes. Revert any changes they may have made, or cancel the entire change.
The only way I have been able to achieve this is to have a set of common fields on my versioned tables:
Root ID: Required - set once to the primary key when the first version of a record is created. This represents the primary key across all of time and is copied into each version of the record. You should consider the Root ID when naming relation columns (eg. PARENT_ROOT_ID instead of PARENT_ID). As the Root ID is also the primary key of the initial version, foreign keys can be created against the actual primary key - the actual desired row will be determined by the version filters defined below.
Change ID: Required - every record is created, updated, deleted via a change
Copied From ID: Nullable - null indicates newly created record, not-null indicates which record ID this row was cloned from when updated
Effective From Date/Time: Nullable - null indicates proposed record, not-null indicates when the record became current. Unfortunately a unique index cannot be placed on Root ID/Effective From as there can be multiple null values for any Root ID. (Unless you want to restrict yourself to a single proposed change per record)
Effective To Date/Time: Nullable - null indicates current/proposed, not-null indicates when it became historical. Not technically required but helps speed up queries finding the current data. This field could be corrupted by hand-edits but can be rebuilt from the Effective From Date/Time if this occurs.
Delete Flag: Boolean - set to true when it is proposed that the record be deleted upon becoming current. When deletes are committed, their Effective To Date/Time is set to the same value as the Effective From Date/Time, filtering them out of the current data set.
The query to get the current state of data according to a change would be;
SELECT * FROM table WHERE (CHANGE_ID IN :ChangeId OR (EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now) AND ROOT_ID NOT IN (SELECT ROOT_ID FROM table WHERE CHANGE_ID IN :ChangeId)))
(The filtering of change-on-change multiples is done outside of this query).
The query to get the current state of data at a point in time would be;
SELECT * FROM table WHERE EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now)
Common indexes created on (ROOT_ID, EFFECTIVE_FROM), (EFFECTIVE_FROM, EFFECTIVE_TO) and (CHANGE_ID).
If anyone knows a better solution I would love to hear about it.
add a comment |
I'm not sure if we have the same problem, but I required a large number of 'proposed' changes to the current data set (with chained proposals, ie, proposal on proposal).
Think branching in source control but for database tables.
We also wanted a historical log but this was the least important factor - the main issue was managing change proposals which could hang around for 6 months or longer as the business mulled over change approval and got ready for the actual change to be implemented.
The idea is that users can load up a Change and start creating, editing, deleting the current state of data without actually applying those changes. Revert any changes they may have made, or cancel the entire change.
The only way I have been able to achieve this is to have a set of common fields on my versioned tables:
Root ID: Required - set once to the primary key when the first version of a record is created. This represents the primary key across all of time and is copied into each version of the record. You should consider the Root ID when naming relation columns (eg. PARENT_ROOT_ID instead of PARENT_ID). As the Root ID is also the primary key of the initial version, foreign keys can be created against the actual primary key - the actual desired row will be determined by the version filters defined below.
Change ID: Required - every record is created, updated, deleted via a change
Copied From ID: Nullable - null indicates newly created record, not-null indicates which record ID this row was cloned from when updated
Effective From Date/Time: Nullable - null indicates proposed record, not-null indicates when the record became current. Unfortunately a unique index cannot be placed on Root ID/Effective From as there can be multiple null values for any Root ID. (Unless you want to restrict yourself to a single proposed change per record)
Effective To Date/Time: Nullable - null indicates current/proposed, not-null indicates when it became historical. Not technically required but helps speed up queries finding the current data. This field could be corrupted by hand-edits but can be rebuilt from the Effective From Date/Time if this occurs.
Delete Flag: Boolean - set to true when it is proposed that the record be deleted upon becoming current. When deletes are committed, their Effective To Date/Time is set to the same value as the Effective From Date/Time, filtering them out of the current data set.
The query to get the current state of data according to a change would be;
SELECT * FROM table WHERE (CHANGE_ID IN :ChangeId OR (EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now) AND ROOT_ID NOT IN (SELECT ROOT_ID FROM table WHERE CHANGE_ID IN :ChangeId)))
(The filtering of change-on-change multiples is done outside of this query).
The query to get the current state of data at a point in time would be;
SELECT * FROM table WHERE EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now)
Common indexes created on (ROOT_ID, EFFECTIVE_FROM), (EFFECTIVE_FROM, EFFECTIVE_TO) and (CHANGE_ID).
If anyone knows a better solution I would love to hear about it.
add a comment |
I'm not sure if we have the same problem, but I required a large number of 'proposed' changes to the current data set (with chained proposals, ie, proposal on proposal).
Think branching in source control but for database tables.
We also wanted a historical log but this was the least important factor - the main issue was managing change proposals which could hang around for 6 months or longer as the business mulled over change approval and got ready for the actual change to be implemented.
The idea is that users can load up a Change and start creating, editing, deleting the current state of data without actually applying those changes. Revert any changes they may have made, or cancel the entire change.
The only way I have been able to achieve this is to have a set of common fields on my versioned tables:
Root ID: Required - set once to the primary key when the first version of a record is created. This represents the primary key across all of time and is copied into each version of the record. You should consider the Root ID when naming relation columns (eg. PARENT_ROOT_ID instead of PARENT_ID). As the Root ID is also the primary key of the initial version, foreign keys can be created against the actual primary key - the actual desired row will be determined by the version filters defined below.
Change ID: Required - every record is created, updated, deleted via a change
Copied From ID: Nullable - null indicates newly created record, not-null indicates which record ID this row was cloned from when updated
Effective From Date/Time: Nullable - null indicates proposed record, not-null indicates when the record became current. Unfortunately a unique index cannot be placed on Root ID/Effective From as there can be multiple null values for any Root ID. (Unless you want to restrict yourself to a single proposed change per record)
Effective To Date/Time: Nullable - null indicates current/proposed, not-null indicates when it became historical. Not technically required but helps speed up queries finding the current data. This field could be corrupted by hand-edits but can be rebuilt from the Effective From Date/Time if this occurs.
Delete Flag: Boolean - set to true when it is proposed that the record be deleted upon becoming current. When deletes are committed, their Effective To Date/Time is set to the same value as the Effective From Date/Time, filtering them out of the current data set.
The query to get the current state of data according to a change would be;
SELECT * FROM table WHERE (CHANGE_ID IN :ChangeId OR (EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now) AND ROOT_ID NOT IN (SELECT ROOT_ID FROM table WHERE CHANGE_ID IN :ChangeId)))
(The filtering of change-on-change multiples is done outside of this query).
The query to get the current state of data at a point in time would be;
SELECT * FROM table WHERE EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now)
Common indexes created on (ROOT_ID, EFFECTIVE_FROM), (EFFECTIVE_FROM, EFFECTIVE_TO) and (CHANGE_ID).
If anyone knows a better solution I would love to hear about it.
I'm not sure if we have the same problem, but I required a large number of 'proposed' changes to the current data set (with chained proposals, ie, proposal on proposal).
Think branching in source control but for database tables.
We also wanted a historical log but this was the least important factor - the main issue was managing change proposals which could hang around for 6 months or longer as the business mulled over change approval and got ready for the actual change to be implemented.
The idea is that users can load up a Change and start creating, editing, deleting the current state of data without actually applying those changes. Revert any changes they may have made, or cancel the entire change.
The only way I have been able to achieve this is to have a set of common fields on my versioned tables:
Root ID: Required - set once to the primary key when the first version of a record is created. This represents the primary key across all of time and is copied into each version of the record. You should consider the Root ID when naming relation columns (eg. PARENT_ROOT_ID instead of PARENT_ID). As the Root ID is also the primary key of the initial version, foreign keys can be created against the actual primary key - the actual desired row will be determined by the version filters defined below.
Change ID: Required - every record is created, updated, deleted via a change
Copied From ID: Nullable - null indicates newly created record, not-null indicates which record ID this row was cloned from when updated
Effective From Date/Time: Nullable - null indicates proposed record, not-null indicates when the record became current. Unfortunately a unique index cannot be placed on Root ID/Effective From as there can be multiple null values for any Root ID. (Unless you want to restrict yourself to a single proposed change per record)
Effective To Date/Time: Nullable - null indicates current/proposed, not-null indicates when it became historical. Not technically required but helps speed up queries finding the current data. This field could be corrupted by hand-edits but can be rebuilt from the Effective From Date/Time if this occurs.
Delete Flag: Boolean - set to true when it is proposed that the record be deleted upon becoming current. When deletes are committed, their Effective To Date/Time is set to the same value as the Effective From Date/Time, filtering them out of the current data set.
The query to get the current state of data according to a change would be;
SELECT * FROM table WHERE (CHANGE_ID IN :ChangeId OR (EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now) AND ROOT_ID NOT IN (SELECT ROOT_ID FROM table WHERE CHANGE_ID IN :ChangeId)))
(The filtering of change-on-change multiples is done outside of this query).
The query to get the current state of data at a point in time would be;
SELECT * FROM table WHERE EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now)
Common indexes created on (ROOT_ID, EFFECTIVE_FROM), (EFFECTIVE_FROM, EFFECTIVE_TO) and (CHANGE_ID).
If anyone knows a better solution I would love to hear about it.
edited Nov 22 '18 at 1:24
answered Nov 22 '18 at 0:07
BenBen
13
13
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f125877%2fversioning-database-persisted-objects-how-would-you%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
See this existing question
– Matthew Watson
Sep 24 '08 at 8:01
This is actually quite a good question and I'm guessing that since you've accepted my answer a penny must have dropped. IMO this ad-hoc history tracking is one of the cardinal architectural sins of Insurance systems (an area that I work in). +1
– ConcernedOfTunbridgeWells
Sep 24 '08 at 9:31
1
I accepted your answer because the links you provided gave me the knowledge I needed to make an informed decision. Being given enough knowledge to answer the question yourself is a superior way of learning, than just being spoon-fed generic answers.
– Chris Vest
Sep 25 '08 at 16:25
Glad to be of help - I've found that doing a bit of web research and including the fan-out is a good sweet spot for stackoverflow answers. Many of my better (and more popular) answers have been done in this format, and it seems to be appreciated.
– ConcernedOfTunbridgeWells
Oct 8 '08 at 10:22