Managing time in relational databases- P4

Managing time in relational databases- P4

Thể loại: Cơ sở dữ liệu
Lượt xem: 59,825Lượt tải: 4Số trang: 20

Mô tả tài liệu

Tham khảo tài liệu 'managing time in relational databases- p4', công nghệ thông tin, cơ sở dữ liệu phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Tóm tắt nội dung

approach is better at tracking changes to objects and to other than metric Temporal Data: Data At this point in our we are concerned with rather than with event data, and with state data that rather than state data that needs to be then are the various options for managing state of all, we need to recognize that there are two kinds of states to manage. One is the state of the things we are the states those things pass through as they change over time. But there is another kind of state, that being the state of the Data, such as rows in tables, can be in one of two or (As we will see in Chapter 12, it can also be in a third state, one in which it is neither correct nor Version tables and assertion tables record, state of objects and the state of our data about those State Data In a Customer table, each row the cur- rent state of a customer. Each time the state of a i.e. each time a row is updated, the old data is the new data. By adding one (or sometimes two) date(s) to the primary key of the table, it becomes a table. But since we already know that there are two temporal that can be with data, we know to ask “What kind of we saw in the Preface, there are and assertion tables. Version tables of changes that happen in the real world, changes to in those tables. Each change is recorded as a new version of an object. Assertion tables keep track of we have made to data we later to be in is recorded as a new assertion about the versions make up a true history of what happened to The make up a virtual logfile of the data in the when temporal data is turn out to be version tables, not assertion tables. In the temporal model [2002, the authors focus on data. is not even alluded to until 2 A TAXONOMY OF DATA METHODS chapter, at which point it is suggested that history” tables be used to manage the other Since receives only a passing that book, we choose to classify the as a IT best practices for managing temporal we will discuss in detail in Chapter 4—once again the are version tables, and error is an issue that is mostly left to take care of itself.4 For the most part, it does so by incorrect data.5 This is why we classify IT as Temporal we call the temporal model was Chris Date, Hugh Darwen and Dr. Nikos Lorentzos in Temporal Data and the Model 2002).6 This model is based in large part on developed by Dr. Lorentzos to manage temporal data by breaking temporal durations down into applying various to those and then the back into as the authors note, is not to temporal data. As we said, except for the chapter in that book, the entire book is a of versioned that chapter, the authors recommend that if there is a to keep track of the assertion time history of a table call it be by means of an auxiliary table which is by the criteria to the best from the rest, the term “best has come to mean little more than “standard What we call “best we discuss in Chapter 4, are standard practices we have seen used by many of our even worse solution is to mix up versions and by creating a new a begin date of Now(), both every time there is a real change, and also every is an error in the data to correct. When that happens, we no longer have a the changes things went through, because we cannot versions And we no longer have a “virtual logfile” of because we how far back the should actually have taken word “model”, as used here and also in the phrases model” model” obviously doesn’t refer to a data model of specific It means something like theory, but with an emphasis on its problems. So “the model”, as we use the term, for something like theory as in current Chapter 2 A TAXONOMY OF DATA addition, these authors do not attempt, in their book, how this method of managing temporal data would work with current Like much of the research on temporal data, they allude to SQL other that do not yet exist, and so their book is in large part a to the standards to adopt the changes to the SQL language which they our own concern is with how to implement with today’s and also with how to sup- port both kinds of data, as well as fully we will have little more to say about the model in this several decades, a best practice has emerged in temporal queryable state data. It is to manage this kind of data by otherwise tables. The result tables which, logically speaking, are tables which com- bine the history tables and current tables described present and future states of for example, are kept in one and the same Customer table. may or may not be flagged; but if they are not, it will be versions created because something about a changed from versions created because past was entered On the other hand, if they the and use of these flags will difficult and are many on the theme of which we have grouped into four major We will in Chapter 4. The IT community has always used the term “version” for this kind of data. And this seems to awareness of an important concept that, as we shall see, is cen- tral to the Asserted approach to temporal data. For the term “version” naturally raises the question “Aversion of what?”, to which our answer is “A version of anything that can persist over time”. This is the concept of a object, and it is, most what Asserted is State Data We now come to our second option, which is to versions and and, most This is data of both Dr. Rick book [2000, and of our 2 A TAXONOMY OF DATA METHODS 43 The Standard Temporal we call the standard temporal model was Dr. Rick Snodgrass in his book in SQL on the computer science work current at that time, and on the work Dr. Snodgrass and others had done on the TSQL (temporal SQL) proposal to the SQL it shows how to implement both data using DBMSs and SQL. We emphasize that, as we are writing, Dr. book is a decade old. We use it as our baseline view of computer on data because most of the computer exists in the form of articles in journals that are not readily to many IT We that Dr. Snodgrass did not write that book as a of computer science research for an IT he wrote it as a of how some of that be adapted to provide a means of managing with the SQL and the DBMSs available at that of the greatest strengths of the standard model is that and both the and the temporal data at the level of SQL For example, it shows us the kind of code that is needed to apply the of entity integrity and integrity to And for any readers who might think that temporal is just a small step beyond the they are already familiar with, many of the shown in Dr. book should disabuse them of that Asserted Temporal we call the Asserted temporal model is our own approach to managing temporal data. Like the it attempts to manage temporal data with current and current SQL. The Asserted model of and data supports all of the of model. In addition, it extends the standard of time by data to be to a table prior to the time when that data will appear in the table as data, available for use. This is done by means of deferred which result in those being the inserted, updated or logically Chapter 2 A TAXONOMY OF DATA resulting from those Deferred in the same tables as other not be available to normal queries. But once time in the real world reaches the beginning of their they will, by that very fact, become currently part of the data that makes up the it is perceived by its emphasize that deferred are not the same thing as rows what things will be like at some time in Those latter rows are current claims about what be like in the future. They are are rows what things were, are, or will be like, but rows which we are not yet willing to true They are way that Asserted differs from the temporal model is in the and integrity The of integrity is made possible by temporal physical Temporal are the ones that users write. The physical are what the DBMS applies to asserted version tables. The Framework (AVF) uses an API to accept Once it validates them, the AVF into one or more physical means of triggers generated from a of a model together with metadata, the temporal semantic as it submits to the of these integrity is made by the concept of an episode. With a row an object can be inserted into that table at some point in time, and later deleted from the table. After it of course, that table no longer contains the the row was ever present. to the period of time during which that row existed in that would be an episode in an asserted version table, one or more rows for the same an episode of an object in an asserted version table is in exactly the period of time that a row for that object in a table. And just as a deletion in a table can sometime later be followed by the insertion of a new row with the same primary key, the of an 7The term “deferred was suggested by Dr. Snodgrass during a series of email exchanges which the authors had with him in the summer of 2 A TAXONOMY OF DATA METHODS in an assertion version table can sometime later by the insertion of a new episode for the same a table, each row must conform to and integrity In an asserted ver- sion table, each version must conform to temporal entity and temporal integrity As we will see, the parallels are in more than name only. Temporal really is entity integrity applied to temporal data. integrity really is integrity applied entries whose form strong are grouped together in the following list. The same glossary entries may be grouped together in different ways at the end of different chapters, each grouping of each chapter. There will usually be sev- eral other, and often many other, glossary entries that are in the list, and we recommend that the Glossary whenever an term is Framework entity integrity integrity temporal Asserted temporal standard temporal model 46 Chapter 2 A TAXONOMY OF DATA The Origins of Asserted Computer Science Research 51 4. The Origins of Asserted The Best Practices 75 5. The Core Concepts of Asserted 95 6. Diagrams and Other Notations 119 7. The Basic Scenario 141 Part 1 provided the context for Asserted a a taxonomy of various ways in which temporal data has been managed over the last several decades. Here in Part 2, Asserted itself and prepare the way for the detailed in Part 3 of how Asserted Chapter 3, we discuss the origins of Asserted science research. Based on the work of we introduce the concepts of a clock tick and an atomic clock tick, the latter of which, in their is called a chronon. We go on to discuss the various ways in periods are by pairs of dates or of SQL does not directly support the concept of a time are only a finite number of ways that two time be situated, with respect to one another, along a Time in Doi: # 2010 Elsevier Inc. All rights of in any form reserved. For example, one time period may entirely precede follow another, they may partially overlap or be they may start at different times but end at the same time, and so on. These different among pairs of time per- iods have been and and are called the Allen They will play an important role in of Asserted because there are in which we will want to compare time periods. With the Allen as a check, we can make sure that we have all the important section of this chapter discusses the between the computer science notion of and our own notion of assertion time. This is based on our of the concepts of deferred and deferred and for their under the more general concept of a pipeline Chapter 4, we discuss the origins of Asserted in IT best those related to that these practices are on four basic data. In this chapter, we present each of by means of examples which include sample a running on how inserts, updates and the data in those Chapter 5, we present the The core concepts of objects, episodes, and are defined, a which leads us to the statement of Asserted that every row in an asserted version table is the assertion of a version of an episode of an object. We continue on to discuss how are in asserted version tables, how integrity and temporal integrity enforce the of Asserted and finally how the of temporal Chapter 6, we introduce the schema common to version tables, as well as various diagrams and will be used in the rest of the book. We also introduce the topic of how Asserted supports the dynamic hide the of that schema from query would otherwise likely be confused by that an object is by a row in a the sequence of events begins with the insertion of that row, continues with zero or more updates, and either with no further activity, or ends when the row is When an object is in an asserted Part 2 AN TO ASSERTED the result includes one row to the insert in the table, rows to to the original row in the table, and row if a delete takes place. This events what we call the basic scenario of both and asserted version tables. In Chap- ter 7, we describe how the basic scenario works when the target of that activity is an asserted version entries whose form strong are grouped together in the following list. The same Glossary entries may be grouped together in different ways at the end of different chapters, each grouping of each chapter. There will usually be sev- eral other, and often many other, Glossary entries that are in the list, and we recommend that the Glossary whenever an term is clock entity 2 AN TO ASSERTED ORIGINS OF Roots of Asserted Science Research and Clock Ticks 55 Time Periods and Date Pairs 56 The Very Concept of 63 Allen Indexing to SQL 72 We begin this chapter with an overview of the three sources computer science work on temporal practices in the IT related to work by the authors We then spend the rest of this chapter computer science data and the relevance of some of to Asserted Roots of Asserted the last three decades, the computer science done extensive work on temporal data, and on data. During that same period of time, the IT has developed various forms of all of which of managing one of the two kinds of may be thought of as a method of both uni- and data which, unlike the of temporal data that rows in tables represent versions of things and Time in Doi: # 2010 Elsevier Inc. All rights of in any form reserved. these rows do not stand alone as semantic that are versions of the same thing are related to by that very fact. Versions that are versions of the and also that together represent an unbroken period of time in the life of that thing, are even more closely related to Groups of versions of the are semantic objects that must bemanaged as single objects, even though they may consist of any number of rows. These single semantic objects are what we call may also be thought of as a form a technique for managing data that in the IT industry over the last But best practice on that theme, the full semantics of versions, i.e. that it is possible to do with versions. In addition, also the of versions with of andwith of data. As we pointed out earlier, it directly manages tables, both forms of tables, as well as tables, as views over those embracing from computer and from business IT best we believe three new concepts to the field of temporal data The first concept is that of In making episodes central to its of data, Asserted breaks with a basic the model, which is that integrity to rows. For example, in model, requires that every foreign key in one a valid which that row has to one In Asserted a foreign key which one row, called a version, has to an to any single row which is part of that second concept is that of the of We define a pipeline dataset as any of data that is not a table, but that contains or origin is such a table.1 Pipeline term indicates that these tables contain “real” data. processes are being carried out to maintain these tables, and to keep as accurate, secure and current as possible. Regularly scheduled as well as ones, are being carried out to access this data to obtain So tables are the tables that the business tries to current and secure, and from which it draws the it needs to carry out its mission and meet its Chapter 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE contains data destined for tables are datasets. Pipeline datasets which contain data tables are outflow pipeline tables are one example of a pipeline dataset. Sets in files or tables and waiting to to their target tables, are another example. While the use of versions history tables by the tables whose history they track, the use of and deferred batch files waiting to be applied to a database by also them within their target tables. In this book, we will show how the use of these managed objects costs of databases by replacing external files or tables such as history tables, files and logfiles, within tables rather external to third concept is that of as it applies to of temporal data. Asserted the of temporal tables by between temporal and Temporal are inserts, updates as written by their authors, whose targets are tables. They are submitted to the Asserted (AVF), not directly to the DBMS. The AVF into physical insert and update which it sub- mits to the DBMS. These physical implement expressed in those temporal and we note that, except for temporal insert one will always be into multiple In Part 3, we discuss several temporal show both the physical the AVF creates these temporal and also how the AVF knows how to map between the two. The practical of Asserted is its of the of temporal and the processes that manage them. is an method of managing temporal relieves data modelers of the burden of designing data models that must define and integrity on them. It of the burden of writing code that enforces the rules which provide of temporal data. And it relieves query or end users, of the burden of writing com- plex queries that must check for temporal gaps 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE RESEARCH along one or two temporal among a set of rows accessed by the origins of Asserted in computer in IT best and in from are in Figure 3.1. Although deferred and deferred are the for of several pipeline datasets, they are shown as a separate item in Figure 3.1 because of their Science 2000, Dr. Rick Snodgrass stated that academics had at that time, over 1500 articles on temporal data [2000, xviii]. And over the last decade, work has been done. For our purposes, this work two (i) research relevant to our task of building a method of temporal data that today’s and (ii) research on future the of temporal functions in and for the of these functions to the current SQL on within the computer science of within the IT as managed of and of and of temporal integrity 3.1 The Origins of Asserted Chapter 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE and Clock Ticks An atomic clock tick is the smallest interval of time the DBMS that can elapse between any two to a We note that the standard term for an atomic clock tick is a chronon. A clock tick is an interval of time defined on the basis of atomic clock that is used in an Asserted database to two time periods of rows in asserted version tables, and also to indicate several important points in time. In asserted clock ticks are used for effective time begin and end dates and for episode begin dates; and atomic clock ticks are used time begin and end dates, and for row create ticks may be defined at any level of by the DBMS. A timestamp is a tick. A date a daily clock tick. A 1-month a situation in which a database is updated from a of at most once a time periods measured in clock ticks of are compared, the technical issues involved in are complex, even if the issues are not. In our own of Asserted we issues by requiring that all asserted version tables in the same database use clock ticks of the same In we assume, albeit somewhat that version tables use a clock that ticks once amonth otherwise noted), and we will use the first day of to represent those clock ticks. This means that these asserted version tables happen on the first of each by the same token, that the state of the database after update will remain unaltered for the rest of the no matter what we choose for our clock is the issue of how clock ticks of one level of onto those of another level. For example, if we choose a clock tick of one day, when does that day start? Does June 1st, 2012 begin at or perhaps at or perhaps even at simplest solution is to let the DBMS determine the us; and in most cases this is perfectly adequate. But IT should at least be aware that these do are not referring here to the atomic clock on which standard time An atomic clock tick, in the sense being defined here, is a logical concept, not 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE RESEARCH should also be aware that another issue exists, that of how SQL map to when things happen in the SQL uses Universal Time (UTC), which is based on cesium clocks, which might lead us to conclude that SQL are extremely accurate. Precise they may be; but issues of accuracy involved in their use do example, suppose we know that an has just happened will happen again in exactly 725 days, 5 hours and 23 seconds. If we know the SQL timestamp when the event occurred, can we calculate exactly what the will be when the event happens again? We can, calculate the number of seconds that will have the two events, that number being it is a mistake to conclude that from knowing of the first event and also knowing the number until the second event, we can calculate the the second event. In fact, we cannot reliably do that. The rea- son is that from the moment the first timestamp is current to that the second timestamp is current, one or more may have been added to or from the count of cesium clock ticks between those two and for the same reason that a leap day is added to the Gregorian once every four years. But unlike the leap year is regular enough to count on, we do not know, how many leap seconds might be added or now and the next of that or when those may of UTC and its by means of leap seconds is seldom an issue in IT. The reason is that for business purposes, we all tag our data with SQL and we all regard those an accurate record of when business events happen in the In addition, for most business purposes, we assume that a SQL timestamp plus a defined interval of time will result in a second timestamp that when some second event Periods and Date Pairs SQL does not recognize a period of time as a managed we have to use a pair of dates. There are four ways we can use a pair of dates to do this. Either the beginning date, or 3A more detailed of how SQL relate to events in Chapter 3 of Dr. book. 56 Chapter 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE ending date, or both, may or may not be included in the they delimit. If a date is not included, the time period is said to be open on that end; otherwise it is said to be of all four ways of time periods and end dates can be found in the book temporal model [2000, and also in the the temporal model [2002, Date, Here, we discuss just two of the four and This is because the is actually the best way of time whereas the appears to be the best way of doing two versions of the same object, V1 and V2. In in this example, the first time period for the object starts on 2/19/2011 and ends on and the second starts and ends on there is no gap in time between these two time periods. 5/22/2011 are clock are we to represent this? Two of the are shown in Figures 3.2 and that the first time period starts on 2/19/2011 and ends and that the second time period starts on ends on it might seem that the is obviously the correct one, and that the is obviously wrong. But that is not the case. If wewanted to argue in support of the we could try to show that the 3.2 A of Two Time 3.3 A of Two Time this section, we will assume that our clock ticks once a 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE RESEARCH not be used by querying for the time period that date On the it that we can’t tell to which version 5/23/2011 belongs. But in fact, we can. We just need the following WHERE clause in our SQL D1 <¼ AND < D2 With this the query will correctly pick out V2 3.2. So one reason we might have thought that the is right is that its begin and end dates are the we used to set up the example when we said that one and ends on one set of dates and the other period ends on the other set of dates. Another reason we might that the is right is that we are looking for a pair of dates that a third date is in one sense, is on display in the statement “Pick a number between 1 and 10”. We all know that 1 and 10 are that we can pick. And the SQL BETWEEN to this sense of the word. So if we use the we can BETWEEN D1 AND D2 In other words, when a is used, we can rely on SQL’s BETWEEN to express what we by which is what we might call the of there is another sense of between, which is on display in the statement “The abandoned car is somewhere 6 and 10, along I-65N, heading out of We all know that to find the car, we should start at milepost 6 up to milepost 10. In we know that we to search past milepost 10, i.e. past the start of the heading out of is the sense of between used in the The closest English would be “from . . . . . up to”, in the sense of “from” and then “up to but not But since each SQL predicate returns the correct result, is used with its method of of time, each method is then, is the advantage of using the Well, look again at the two ways of V1 and V2. In both cases, V1 and V2 are We know we have set up the example that way. And with we can see that there 58 Chapter 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE no gap between V1 and V2. But with the we don’t know whether there is a gap or not. To make that we also need to know at what rate the Assuming for this example that the clock a day, then only when given this we from a that V1 and V2 have no gap between of whether or not two of the same object are is not just a the human beings trying to the data. It is also a problem for the code which will often be called upon to of two versions, whether or not they i.e. whether or not there is a clock tick With a the code only has two values to see whether or not they are matters worse, the of DBMS has changed over time. these seconds. Later, they were expressed as then and now we at the point where will be expressed With this sliding scale, across DBMSs and we might not even know the at which to try to determine if two versions are or are not With a the code to make this is trivial. With a it may the SQL standards groups can agree on to the SQL standard, part of that agreement will be a way of time periods directly, without rely- ing on the confusing of various ways of of dates to represent time periods. But our concern, in is with today’s SQL, and so we must choose a of from this point will use the for time and will use the notation 9999 in our to latest date that the DBMS can recognize and for a PERIOD datatype has been by such vendors as Oracle But what that support means may differ from vendor to vendor. Can a unique index be defined on a PERIOD datatype? Which of the Allen are also So, lacking a standard for the PERIOD datatype, we will continue the practice of defining periods of time in terms of their begin and in 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE RESEARCH latest date cannot be used in effective or assertion If it is used as business data, then it has whatever users assign to it, which will probably be as the of a day which is still a long way off. But if it is used in effective end dates, it is treated as a date by the DBMS but is as a date by the users of the system. Instead, to it means “later than now”. we can say that it means “end of the time period not known, but assumed to not have happened yet”. We will generally use the phrase notice to represent this effect until or asserted as true until further we specify an insert or an update against a table, we normally do not know when, if ever, the will be next updated or deleted. By the same token, when we specify a temporal insert or a temporal update against table, we normally do not know when, if ever, time period of the new version will end, because we when, if ever, a update or deletion will data modelers will argue that NULL should be used data. While this is true, the of the physical model will benefit from this non-null datatype, as will be explained in Chapter 15. Moreover, we do know one this unknown date. For as long as it is unknown, we choose to assume that it has not happened yet. In other as long as it is unknown, we want the DBMS to tell us that it is greater than the value of Now(), whenever we ask that If we used NULL instead of a data value, the DBMS would not give us that answer to that an effective end date is unknown, then, instead of we will set it to or to whatever the date that the DBMS So the semantics is date unknown (but still in but the is a real date, one in this case that is nearly 8000 years from now. How does a date like that semantics like we assume that the date will not be represent that far-off New Year’s Eve. So it is available to use for a special purpose. In the case of an effective end date, we often insert or update versions without knowing when the time period of the new version may end. So in this means that the end of the effective time period of the version thus marked is unknown, and that the time remain in effect until further notice. For example, when address is added to the database, we usually will not know when that customer will move from that Chapter 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE the status of that address is that it will remain in effect customer moves, at some unknown date in the the case of an assertion end date, that date is always set when a row is created. This reflects the we would never want to assert that a version is true, knowing that, at some later point in time, we will minds and no longer be willing to make that While we adopt this in this book, and in the is our of these concepts, we note that a more robust semantics might not include this and that, later releases of our Asserted may permit assertion end dates to be specified on temporal effective end date may be changed, i.e. the time period it ends may be shortened or as long as the not violate temporal integrity which we in Part 3. As for an assertion end date, it may be one of four an assertion end date may be changed because we real- ize that the assertion is and we have the for that error. In that case, the incorrect assertion will cease to i.e. it will be as part of the same of work that inserts the and the same date will be used both for the assertion end date of the and the assertion begin date of the an assertion end date may be changed because that an assertion is incorrect even though we do not know what the correct data is, or else just because, for some reason, we do not wish to make that assertion any longer. Third, we that an assertion about the future is no longer probably because both we and our have and more accurate In either we briefly allude to, later on, what we assert by means of a row in a table is not so much that the statement made by the row is true. It is, rather, that the statement An roughly, is one which is good enough to base decision on. about the past or the present must be true in order to be But about the future, such as lack a known truth value until the future time about which they comes to pass, at which point those become either true or may be when first made, and thus be true. Moreover, they may be when first made but, over time, even prior to becoming true or false. (See, later in this chapter, the Very Concept of See also Chapter 13, the section Past Claims About the 3 THE ORIGINS OF ASSERTED COMPUTER SCIENCE RESEARCH 61