Managing time in relational databases- P4

Thể loại: Cơ sở dữ liệu
Lượt xem: 59,813Lượt tải: 4Số trang: 20

Mô tả tài liệu

Tham khảo tài liệu 'managing time in relational databases- p4', công nghệ thông tin, cơ sở dữ liệu phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Tóm tắt nội dung

approach is better at tracking changes to persistent objects and to relationships other than metric balances. State Temporal Data: Uni-Temporal and Bi-Temporal Data At this point in our discussion, we are concerned with state data rather than with event data, and with state data that is queryable rather than state data that needs to be reconstructed. What then are the various options for managing temporal queryable state data? First of all, we need to recognize that there are two kinds of states to manage. One is the state of the things we are interested in, the states those things pass through as they change over time. But there is another kind of state, that being the state of the data itself. Data, such as rows in tables, can be in one of two states: correct or incorrect. (As we will see in Chapter 12, it can also be in a third state, one in which it is neither correct nor incor- rect.) Version tables and assertion tables record, respectively, the state of objects and the state of our data about those objects. Uni-Temporal State Data In a conventional Customer table, each row represents the cur- rent state of a customer. Each time the state of a customer changes, i.e. each time a row is updated, the old data is overwritten with the new data. By adding one (or sometimes two) date(s) or timestamp(s) to the primary key of the table, it becomes a uni- temporal table. But since we already know that there are two dif- ferent temporal dimensions that can be associated with data, we know to ask “What kind of uni-temporal table?” As we saw in the Preface, there are uni-temporal version tables and uni-temporal assertion tables. Version tables keep track of changes that happen in the real world, changes to the objects represented in those tables. Each change is recorded as a new version of an object. Assertion tables keep track of correct- ions we have made to data we later discovered to be in error. Each correction is recorded as a new assertion about the object. The versions make up a true history of what happened to those objects. The assertions make up a virtual logfile of corrections to the data in the table. Usually, when table-level temporal data is discussed, the tables turn out to be version tables, not assertion tables. In their book describing the alternative temporal model [2002, Date, Darwen, Lorentzos], the authors focus on uni-temporal versioned data. Bi-temporality is not even alluded to until the Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 41 penultimate chapter, at which point it is suggested that “logged time history” tables be used to manage the other temporal dimension. Since bi-temporality receives only a passing mention in that book, we choose to classify the alternative temporal model as a uni-temporal model. In IT best practices for managing temporal data—which we will discuss in detail in Chapter 4—once again the temporal tables are version tables, and error correction is an issue that is mostly left to take care of itself.4 For the most part, it does so by overwriting incorrect data.5 This is why we classify IT best practices as uni-temporal models. The Alternative Temporal Model What we call the alternative temporal model was developed by Chris Date, Hugh Darwen and Dr. Nikos Lorentzos in their book Temporal Data and the Relational Model (Morgan- Kaufmann, 2002).6 This model is based in large part on tec- hniques developed by Dr. Lorentzos to manage temporal data by breaking temporal durations down into temporally atomic components, applying various transformations to those compo- nents, and then re-assembling the components back into those temporal durations—a technique, as the authors note, whose applicability is not restricted to temporal data. As we said, except for the penultimate chapter in that book, the entire book is a discussion of uni-temporal versioned tables. In that chapter, the authors recommend that if there is a require- ment to keep track of the assertion time history of a table (which they call “logged-time history”), it be implemented by means of an auxiliary table which is maintained by the DBMS. 4Lacking criteria to distinguish the best from the rest, the term “best practices” has come to mean little more than “standard practices”. What we call “best practices”, and which we discuss in Chapter 4, are standard practices we have seen used by many of our clients. 5An even worse solution is to mix up versions and assertions by creating a new row, with a begin date of Now(), both every time there is a real change, and also every time there is an error in the data to correct. When that happens, we no longer have a history of the changes things went through, because we cannot distinguish versions from corrections. And we no longer have a “virtual logfile” of corrections because we don’t know how far back the corrections should actually have taken effect. 6The word “model”, as used here and also in the phrases “alternative model” and “Asserted Versioning model” obviously doesn’t refer to a data model of specific subject matter. It means something like theory, but with an emphasis on its applicability to real-world problems. So “the relational model”, as we use the term, for example, means something like “relational theory as implemented in current relational technology”. 42 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS In addition, these authors do not attempt, in their book, to explain how this method of managing temporal data would work with current relational technology. Like much of the computer science research on temporal data, they allude to SQL operators and other constructs that do not yet exist, and so their book is in large part a recommendation to the standards committees to adopt the changes to the SQL language which they describe. Because our own concern is with how to implement temporal concepts with today’s technologies, and also with how to sup- port both kinds of uni-temporal data, as well as fully bi-temporal data, we will have little more to say about the alternative tempo- ral model in this book. Best Practices Over several decades, a best practice has emerged in manag- ing temporal queryable state data. It is to manage this kind of data by versioning otherwise conventional tables. The result is versioned tables which, logically speaking, are tables which com- bine the history tables and current tables described previously. Past, present and future states of customers, for example, are kept in one and the same Customer table. Corrections may or may not be flagged; but if they are not, it will be impossible to distinguish versions created because something about a cus- tomer changed from versions created because past customer data was entered incorrectly. On the other hand, if they are flagged, the management and use of these flags will quickly become difficult and confusing. There are many variations on the theme of versioning, which we have grouped into four major categories. We will discuss them in Chapter 4. The IT community has always used the term “version” for this kind of uni-temporal data. And this terminology seems to reflect an awareness of an important concept that, as we shall see, is cen- tral to the Asserted Versioning approach to temporal data. For the term “version” naturally raises the question “Aversion of what?”, to which our answer is “A version of anything that can persist and change over time”. This is the concept of a persistent object, and it is, most fundamentally, what Asserted Versioning is about. Bi-Temporal State Data We now come to our second option, which is to manage both versions and assertions and, most importantly, their interdependencies. This is bi-temporal data management, the subject of both Dr. Rick Snodgrass’s book [2000, Snodgrass] and of our book. Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 43 The Standard Temporal Model What we call the standard temporal model was developed by Dr. Rick Snodgrass in his book Developing Time-Oriented Database Applications in SQL (Morgan-Kaufmann, 2000). Based on the computer science work current at that time, and especially on the work Dr. Snodgrass and others had done on the TSQL (temporal SQL) proposal to the SQL standards committees, it shows how to implement both uni-temporal and bi-temporal data management using then-current DBMSs and then-current SQL. We emphasize that, as we are writing, Dr. Snodgrass’s book is a decade old. We use it as our baseline view of computer science work on bi-temporal data because most of the computer science literature exists in the form of articles in scientific journals that are not readily accessible to many IT professionals. We also emphasize that Dr. Snodgrass did not write that book as a com- pendium of computer science research for an IT audience. Instead, he wrote it as a description of how some of that research could be adapted to provide a means of managing bi-temporal data with the SQL and the DBMSs available at that time. One of the greatest strengths of the standard model is that it discusses and illustrates both the maintenance and the querying of temporal data at the level of SQL statements. For example, it shows us the kind of code that is needed to apply the temporal analogues of entity integrity and referential integrity to temporal data. And for any readers who might think that temporal data management is just a small step beyond the versioning they are already familiar with, many of the constraint-checking SQL statements shown in Dr. Snodgrass’s book should suffice to disabuse them of that notion. The Asserted Versioning Temporal Model What we call the Asserted Versioning temporal model is our own approach to managing temporal data. Like the standard model, it attempts to manage temporal data with current tech- nology and current SQL. The Asserted Versioning model of uni-temporal and bi-tem- poral data management supports all of the functionality of the standard model. In addition, it extends the standard model’s notion of transaction time by permitting data to be physically added to a table prior to the time when that data will appear in the table as production data, available for use. This is done by means of deferred transactions, which result in deferred assertions, those being the inserted, updated or logically deleted 44 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS rows resulting from those transactions.7 Deferred assertions, although physically co-located in the same tables as other data, will not be immediately available to normal queries. But once time in the real world reaches the beginning of their assertion periods, they will, by that very fact, become currently asserted data, part of the production data that makes up the database as it is perceived by its users. We emphasize that deferred assertions are not the same thing as rows describing what things will be like at some time in the future. Those latter rows are current claims about what things will be like in the future. They are ontologically post-dated. Deferred assertions are rows describing what things were, are, or will be like, but rows which we are not yet willing to claim make true statements. They are epistemologically post-dated. Another way that Asserted Versioning differs from the stan- dard temporal model is in the encapsulation and simplification of integrity constraints. The encapsulation of integrity con- straints is made possible by distinguishing temporal transactions from physical transactions. Temporal transactions are the ones that users write. The corresponding physical transactions are what the DBMS applies to asserted version tables. The Asserted Versioning Framework (AVF) uses an API to accept temporal transactions. Once it validates them, the AVF translates each temporal transaction into one or more physical transactions. By means of triggers generated from a combination of a logical data model together with supplementary metadata, the AVF enforces temporal semantic constraints as it submits physical transactions to the DBMS. The simplification of these integrity constraints is made possi- ble by introducing the concept of an episode. With non-temporal tables, a row representing an object can be inserted into that table at some point in time, and later deleted from the table. After it is deleted, of course, that table no longer contains the information that the row was ever present. Corresponding to the period of time during which that row existed in that non-temporal table, there would be an episode in an asserted version table, consisting of one or more temporally contiguous rows for the same object. So an episode of an object in an asserted version table is in effect during exactly the period of time that a row for that object would exist in a non-temporal table. And just as a deletion in a conven- tional table can sometime later be followed by the insertion of a new row with the same primary key, the termination of an 7The term “deferred transaction” was suggested by Dr. Snodgrass during a series of email exchanges which the authors had with him in the summer of 2008. Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 45 episode in an assertion version table can sometime later be followed by the insertion of a new episode for the same object. In a non-temporal table, each row must conform to entity integrity and referential integrity constraints. In an asserted ver- sion table, each version must conform to temporal entity integ- rity and temporal referential integrity constraints. As we will see, the parallels are in more than name only. Temporal entity integrity really is entity integrity applied to temporal data. Tem- poral referential integrity really is referential integrity applied to temporal data. Glossary References Glossary entries whose definitions form strong inter- dependencies are grouped together in the following list. The same glossary entries may be grouped together in different ways at the end of different chapters, each grouping reflecting the semantic perspective of each chapter. There will usually be sev- eral other, and often many other, glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfamiliar term is encountered. as-is as-was Asserted Versioning Asserted Versioning Framework (AVF) episode persistent object state thing physical transaction temporal transaction temporal entity integrity (TEI) temporal referential integrity (TRI) the alternative temporal model the Asserted Versioning temporal model the standard temporal model 46 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS PART 2 AN INTRODUCTION TO ASSERTED VERSIONING Chapter Contents 3. The Origins of Asserted Versioning: Computer Science Research 51 4. The Origins of Asserted Versioning: The Best Practices 75 5. The Core Concepts of Asserted Versioning 95 6. Diagrams and Other Notations 119 7. The Basic Scenario 141 Part 1 provided the context for Asserted Versioning, a history and a taxonomy of various ways in which temporal data has been managed over the last several decades. Here in Part 2, we introduce Asserted Versioning itself and prepare the way for the detailed discussion in Part 3 of how Asserted Versioning actually works. In Chapter 3, we discuss the origins of Asserted Versioning in computer science research. Based on the work of computer scientists, we introduce the concepts of a clock tick and an atomic clock tick, the latter of which, in their terminology, is called a chronon. We go on to discuss the various ways in which time periods are represented by pairs of dates or of timestamps, since SQL does not directly support the concept of a time period. There are only a finite number of ways that two time periods can be situated, with respect to one another, along a common Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00024-8 Copyright # 2010 Elsevier Inc. All rights of reproduction in any form reserved. 47 timeline. For example, one time period may entirely precede or entirely follow another, they may partially overlap or be identi- cal, they may start at different times but end at the same time, and so on. These different relationships among pairs of time per- iods have been identified and catalogued, and are called the Allen relationships. They will play an important role in our discussions of Asserted Versioning because there are various ways in which we will want to compare time periods. With the Allen relationships as a completeness check, we can make sure that we have considered all the possibilities. Another important section of this chapter discusses the dif- ference between the computer science notion of transaction time, and our own notion of assertion time. This difference is based on our development of the concepts of deferred trans- actions and deferred assertions, and for their subsumption under the more general concept of a pipeline dataset. In Chapter 4, we discuss the origins of Asserted Versioning in IT best practices, specifically those related to versioning. We believe that these practices are variations on four basic methods of versioning data. In this chapter, we present each of these methods by means of examples which include sample tables and a running commentary on how inserts, updates and deletes affect the data in those tables. In Chapter 5, we present the conceptual foundations of Asserted Versioning. The core concepts of objects, episodes, vers- ions and assertions are defined, a discussion which leads us to the fundamental statement of Asserted Versioning, that every row in an asserted version table is the assertion of a version of an episode of an object. We continue on to discuss how time periods are represented in asserted version tables, how temporal entity integrity and temporal referential integrity enforce the core semantics of Asserted Versioning, and finally how Asserted Versioning internalizes the complexities of temporal data management. In Chapter 6, we introduce the schema common to all asserted version tables, as well as various diagrams and notations that will be used in the rest of the book. We also introduce the topic of how Asserted Versioning supports the dynamic views that hide the complexities of that schema from query authors who would otherwise likely be confused by that complexity. When an object is represented by a row in a non-temporal table, the sequence of events begins with the insertion of that row, continues with zero or more updates, and either continues on with no further activity, or ends when the row is eventually deleted. When an object is represented in an asserted version 48 Part 2 AN INTRODUCTION TO ASSERTED VERSIONING table, the result includes one row corresponding to the insert in the non-temporal table, additional rows corresponding to the updates to the original row in the non-temporal table, and an additional row if a delete eventually takes place. This sequence of events constitutes what we call the basic scenario of activity against both conventional and asserted version tables. In Chap- ter 7, we describe how the basic scenario works when the target of that activity is an asserted version table. Glossary References Glossary entries whose definitions form strong inter- dependencies are grouped together in the following list. The same Glossary entries may be grouped together in different ways at the end of different chapters, each grouping reflecting the semantic perspective of each chapter. There will usually be sev- eral other, and often many other, Glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfamiliar term is encountered. Allen relationships time period assertion version episode object assertion time transaction time atomic clock tick chronon clock tick deferred assertion deferred transaction pipeline dataset temporal entity integrity temporal referential integrity Part 2 AN INTRODUCTION TO ASSERTED VERSIONING 49 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH CONTENTS The Roots of Asserted Versioning 51 Computer Science Research 54 Clocks and Clock Ticks 55 Time Periods and Date Pairs 56 The Very Concept of Bi-Temporality 63 Allen Relationships 65 Advanced Indexing Strategies 68 Temporal Extensions to SQL 69 Glossary References 72 We begin this chapter with an overview of the three sources of Asserted Versioning: computer science work on temporal data; best practices in the IT profession related to versioning; and original work by the authors themselves. We then spend the rest of this chapter discussing computer science contributions to temporal data management, and the relevance of some of these concepts to Asserted Versioning. The Roots of Asserted Versioning Over the last three decades, the computer science community has done extensive work on temporal data, and especially on bi- temporal data. During that same period of time, the IT commu- nity has developed various forms of versioning, all of which are methods of managing one of the two kinds of uni-temporal data. Asserted Versioning may be thought of as a method of manag- ing both uni- and bi-temporal data which, unlike the standard model of temporal data management, recognizes that rows in bi-temporal tables represent versions of things and that, Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00003-0 Copyright # 2010 Elsevier Inc. All rights of reproduction in any form reserved. 51 consequently, these rows do not stand alone as semantic objects. Versions that are versions of the same thing are related to one another by that very fact. Versions that are versions of the same thing, and also that together represent an unbroken period of time in the life of that thing, are even more closely related to one another. Groups of temporally contiguous versions of the same thing are semantic objects that must bemanaged as single tempo- ral objects, even though they may consist of any number of physi- cal rows. These single semantic objects are what we call episodes. Asserted Versioning may also be thought of as a form of versioning, a technique for managing historical data that has evolved in the IT industry over the last quarter-century. But unlike existing best practice variations on that theme, AssertedVersioning supports the full semantics of versions, i.e. everything that it is con- ceptually possible to do with versions. In addition, Asserted Versioning also integrates the management of versions with the management of assertions andwith themanagement of bi-tempo- ral data. As we pointed out earlier, it directly manages bi-temporal physical tables, andmanages both forms of uni-temporal tables, as well as conventional tables, as views over those bi-temporal tables. Besides embracing contributions from computer science research and from business IT best practices, we believe that Asserted Versioning introduces three new concepts to the field of temporal data management. The first concept is that of an episode. In making episodes central to its management of tem- poral data, Asserted Versioning breaks with a basic component of the relational model, which is that integrity constraints apply only to individual rows. For example, referential integrity, in the relational model, requires that every foreign key in one row expresses a valid relationship which that row has to one other row. In Asserted Versioning, a temporalized foreign key expresses a relationship which one row, called a version, has to an episode, not to any single row which is part of that episode. The second concept is that of the internalization of pipeline datasets. We define a pipeline dataset as any collection of busi- ness data that is not a production table, but that contains data whose destination or origin is such a table.1 Pipeline datasets 1The term “production” indicates that these tables contain “real” data. Regularly scheduled processes are being carried out to maintain these tables, and to keep their contents as accurate, secure and current as possible. Regularly scheduled processes, as well as non-scheduled ones, are being carried out to access this data to obtain needed information. So production tables are the tables that the business tries to keep accurate, current and secure, and from which it draws the information it needs to carry out its mission and meet its objectives. 52 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH which contains data destined for production tables are inflow pipeline datasets. Pipeline datasets which contain data derived from production tables are outflow pipeline datasets. History tables are one example of a pipeline dataset. Sets of transactions, accumulated in files or tables and waiting to be applied to their target tables, are another example. While the use of versions eliminates history tables by internalizing them within the tables whose history they track, the use of deferred transactions and deferred assertions eliminates batch files of transactions waiting to be applied to a database by also inter- nalizing them within their target tables. In this book, we will show how the use of these internalized managed objects reduces the costs of maintaining databases by replacing external files or tables such as history tables, transaction files and logfiles, with structures internalized within production tables rather than being external to them. The third concept is that of encapsulation, as it applies to the management of temporal data. Asserted Versioning fully encapsulates the complexities of maintaining temporal tables by distinguishing between temporal transactions and physical transactions. Temporal transactions are inserts, updates and deletes, as written by their authors, whose targets are asserted version tables. They are submitted to the Asserted Versioning Framework (AVF), not directly to the DBMS. The AVF translates them into physical insert and update transactions which it sub- mits to the DBMS. These physical transactions implement the intentions expressed in those temporal transactions, and we note that, except for temporal insert transactions, one temporal transaction will always be translated into multiple physical transactions. In Part 3, we discuss several temporal transactions, and show both the physical transactions the AVF creates to implement these temporal transactions, and also how the AVF knows how to map between the two. The practical orientation of Asserted Versioning is manifest in its encapsulation of the complexities of temporal data structures and the processes that manage them. Asserted Versioning is an integrated method of managing temporal data which relieves data modelers of the burden of designing and maintaining data models that must explicitly define temporal data structures and integrity constraints on them. It also relieves developers of the burden of designing, writing and maintaining code that enforces the rules which provide the semantics of temporal data. And it relieves query authors, whether developers or end users, of the burden of writing com- plex queries that must explicitly check for temporal gaps or Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 53 overlaps, along one or two temporal dimensions, among a set of rows accessed by the query. The origins of Asserted Versioning in computer science research, in IT best practices, and in contributions from the authors, are illustrated in Figure 3.1. Although deferred trans- actions and deferred assertions are the mechanisms for the internalization of several pipeline datasets, they are shown as a separate item in Figure 3.1 because of their particular importance. Computer Science Research In 2000, Dr. Rick Snodgrass stated that academics had publi- shed, at that time, over 1500 articles on temporal data manage- ment [2000, Snodgrass, xviii]. And over the last decade, much additional work has been done. For our purposes, this work falls into two categories: (i) research relevant to our task of designing and building a method of temporal data management that works with today’s technology; and (ii) research on future directions for the implementation of temporal functions in commercial DBMSs, and for the specification of these functions in extensions to the current SQL standard. Research on bi-temporality within the computer science community The development of versioning within the IT community Asserted Versioning Episodes as managed objects The internalization of pipeline datasetsThe authors’ own work Deferred transactions and deferred assertions Full encapsulation of transactions and of temporal integrity constraints Figure 3.1 The Origins of Asserted Versioning. 54 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH Clocks and Clock Ticks An atomic clock tick is the smallest interval of time recognized by the DBMS that can elapse between any two physical modifications to a database.2 We note that the standard computer science term for an atomic clock tick is a chronon. A clock tick is an interval of time defined on the basis of atomic clock ticks, and that is used in an Asserted Versioning database to delimit the two time periods of rows in asserted version tables, and also to indicate several important points in time. In asserted version tables, clock ticks are used for effective time begin and end dates and for episode begin dates; and atomic clock ticks are used for assertion time begin and end dates, and for row create dates. Clock ticks may be defined at any level of granularity supported by the underlying DBMS. A timestamp is a typical clock tick. A date represents a daily clock tick. A 1-month clock tick represents a situation in which a database is updated from a collection of transactions at most once a month. When time periods measured in clock ticks of different granularities are compared, the technical issues involved in the comparison are complex, even if the conceptual issues are not. In our own implementation of Asserted Versioning, we finesse these issues by requiring that all asserted version tables in the same database use clock ticks of the same granularity. In this book, we assume, albeit somewhat unrealistically, that all asserted version tables use a clock that ticks once amonth (except where otherwise noted), and we will use the first day of each month to represent those clock ticks. This means that changes to these asserted version tables happen on the first of each month and, by the same token, that the state of the database after each monthly update will remain unaltered for the rest of the month. But no matter what granularity we choose for our clock ticks, there is the issue of how clock ticks of one level of granularity are mapped onto those of another level. For example, if we choose a clock tick of one day, when does that day start? Does June 1st, 2012 begin at 2012-06-01-12:00:00.000, or perhaps at 2012-06- 01-12:00:00.001, or perhaps even at 2012-05-31-11:59:59.999? The simplest solution is to let the DBMS determine the mapping for us; and in most cases this is perfectly adequate. But IT data management professionals should at least be aware that issues like these do exist. 2We are not referring here to the cesium-based atomic clock on which standard time is based. An atomic clock tick, in the sense being defined here, is a logical concept, not a physical one. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 55 They should also be aware that another issue exists, that of how SQL timestamps map to when things happen in the real world. SQL uses Universal Coordinated Time (UTC), which is based on cesium clocks, which might lead us to conclude that SQL timestamps are extremely accurate. Precise they may be; but issues of accuracy involved in their use do exist. For example, suppose we know that an astronomical event which has just happened will happen again in exactly 725 days, 5 hours and 23 seconds. If we know the SQL timestamp when the event occurred, can we calculate exactly what the SQL timestamp will be when the event happens again? We can, of course, calculate the number of seconds that will have elapsed between the two events, that number being 62,658,023. But it is a mistake to conclude that from knowing the timestamp of the first event and also knowing the number of seconds until the second event, we can calculate the timestamp of the second event. In fact, we cannot reliably do that. The rea- son is that from the moment the first timestamp is current to the moment that the second timestamp is current, one or more leap seconds may have been added to or subtracted from the count of cesium clock ticks between those two timestamps, and for the same reason that a leap day is added to the Gregorian calen- dar once every four years. But unlike the leap year adjustment, which is regular enough to count on, we do not know, in advance, how many leap seconds might be added or subtracted between now and the next occurrence of that astronomical event, or when those adjustments may occur. The unpredictable adjustment of UTC and its SQL timestamps by means of leap seconds is seldom an issue in busi- ness IT. The reason is that for business purposes, we all tag our data with SQL timestamps, and we all regard those timestamps as an accurate record of when business events happen in the real world. In addition, for most business purposes, we assume that a SQL timestamp plus a defined interval of time will result in a second timestamp that represents when some second event will occur.3 Time Periods and Date Pairs SQL does not recognize a period of time as a managed object. Instead, we have to use a pair of dates. There are four ways we can use a pair of dates to do this. Either the beginning date, or 3A more detailed discussion of how SQL timestamps relate to real-world events is contained in Chapter 3 of Dr. Snodgrass’s book. 56 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH the ending date, or both, may or may not be included in the time period they delimit. If a date is not included, the time period is said to be open on that end; otherwise it is said to be closed. A discussion of all four ways of representing time periods with begin and end dates can be found in the book presenting the standard temporal model [2000, Snodgrass] and also in the book presenting the alternative temporal model [2002, Date, Darwen, Lorentzos]. Here, we discuss just two of the four combinations: closed-open and closed-closed. This is because the closed-open representation is actually the best way of representing time per- iods, whereas the closed-closed representation appears to be the best way of doing that. Consider two versions of the same object, V1 and V2. In both cases, in this example, the first time period for the object starts on 2/19/2011 and ends on 5/22/2011, and the second starts on 5/23/2011 and ends on 10/14/2012.4 Consequently there is no gap in time between these two time periods. 5/22/2011 and 5/23/2011 are contiguous clock ticks. How are we to represent this? Two of the possibilities are shown in Figures 3.2 and 3.3. Given that the first time period starts on 2/19/2011 and ends on 5/22/2011, and that the second time period starts on 5/23/2011 and ends on 10/14/2012, it might seem that the closed-closed rep- resentation is obviously the correct one, and that the closed-open representation is obviously wrong. But that is not the case. If wewanted to argue in support of the closed-closed represen- tation, we could try to show that the closed-open representation V1 D1 2/19/2011 5/23/2011 10/15/2012 5/23/2011 D2 V2 Figure 3.2 A Closed-Open Representation of Two Time Periods. D1 2/19/2011 5/22/2011 10/14/20125/23/2011 V1 V2 D2 Figure 3.3 A Closed-Closed Representation of Two Time Periods. 4In this section, we will assume that our clock ticks once a day. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 57 should not be used by querying for the time period that contains the date 5/23/2011. On the closed-open representation, it might seem that we can’t tell to which version 5/23/2011 belongs. But in fact, we can. We just need the following WHERE clause predi- cate in our SQL query: WHERE D1 <¼ ‘05/23/2011’ AND ‘05/23/2011’ < D2 With this predicate, the query will correctly pick out V2 from Figure 3.2. So one reason we might have thought that the closed-closed representation is right is that its begin and end dates are the same dates we used to set up the example when we said that one period begins and ends on one set of dates and the other period begins and ends on the other set of dates. Another reason we might have thought that the closed-closed representation is right is that we are looking for a pair of dates that a third date is between. Between, in one sense, is on display in the statement “Pick a number between 1 and 10”. We all know that 1 and 10 are both numbers that we can pick. And the SQL BETWEEN operator corresponds to this sense of the word. So if we use the closed- closed representation, we can write: WHERE ‘05/23/2011’ BETWEEN D1 AND D2 In other words, when a closed-closed representation is used, we can rely on SQL’s BETWEEN to express what we ordinarily mean by “between”, which is what we might call the inclusive sense of “between”. But there is another sense of between, which is on display in the statement “The abandoned car is somewhere between mileposts 6 and 10, along I-65N, heading out of Pensacola”. We all know that to find the car, we should start at milepost 6 and continue up to milepost 10. In particular, we know that we don’t need to search past milepost 10, i.e. past the start of the tenth mile heading out of Pensacola. This is the sense of between used in the closed-open conven- tion. The closest English equivalent would be “from . . . . . up to”, in the sense of “from” and then “up to but not including”. But since each SQL predicate returns the correct result, provided each is used with its corresponding method of representing periods of time, each method is correct. What, then, is the advantage of using the closed-open repre- sentation? Well, look again at the two ways of representing V1 and V2. In both cases, V1 and V2 are contiguous. We know this because we have set up the example that way. And with the closed-open representation, we can immediately see that there 58 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH is no gap between V1 and V2. But with the closed-closed repre- sentation, we don’t know whether there is a gap or not. To make that determination, we also need to know at what rate the clock ticks. Assuming for this particular example that the clock ticks once a day, then only when given this additional information can we determine, from a closed-closed representation, that V1 and V2 have no gap between them. This difficulty of determining whether or not two consecutive versions of the same object are contiguous is not just a problem for the human beings trying to understand the data. It is also a problem for the code which will often be called upon to deter- mine, of two consecutive versions, whether or not they are contiguous, i.e. whether or not there is a clock tick between them. With a closed-open representation, the code only has to compare two values to see whether or not they are EQUAL. Making matters worse, the granularity of DBMS timestamp datatypes has changed over time. Originally, these timestamps were seconds. Later, they were expressed as milliseconds, then partial microseconds, then microseconds, and now we are almost at the point where timestamps will be expressed in nanoseconds. With this sliding scale, across DBMSs and DBMS upgrades, we might not even know the granularity at which to try to determine if two versions are or are not contiguous. With a closed-open representation, the code to make this determina- tion is trivial. With a closed-open representation, it may actually be impossible. Once the SQL standards groups can agree on temporal extensions to the SQL standard, part of that agreement will cer- tainly be a way of representing time periods directly, without rely- ing on the confusing circumlocutions of various ways of using pairs of dates to represent time periods. But our concern, in this book, is with today’s SQL, and so we must choose a date-pair method of representation.5 Therefore, from this point forward, we will use the closed-open representation for time periods. 9999 and 12/31/9999 We will use the notation 9999 in our illustrations, to represent the latest date that the DBMS can recognize and manipulate. 5Support for a PERIOD datatype has been introduced by such vendors as Oracle and Teradata. But what that support means may differ from vendor to vendor. Can a unique index be defined on a PERIOD datatype? Which of the Allen relationship comparisons are also supported? So, lacking a standard for the PERIOD datatype, we will continue the practice of defining periods of time in terms of their begin and end points in time. Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 59 That latest date cannot be used in effective or assertion begin dates. If it is used as business data, then it has whatever meaning its users assign to it, which will probably be as the designation of a day which is still a long way off. But if it is used in effective or assertion end dates, it is treated as a date by the DBMS but is not interpreted as a date by the users of the system. Instead, to those users, it means “later than now”. Equivalently, we can say that it means “end of the time period not known, but assumed to not have happened yet”. We will generally use the phrase until further notice to represent this semantics—in effect until further notice, or asserted as true until further notice. When we specify an insert or an update against a conven- tional table, we normally do not know when, if ever, the target row will be next updated or deleted. By the same token, when we specify a temporal insert or a temporal update against a bi-temporal table, we normally do not know when, if ever, the effective time period of the new version will end, because we don’t know when, if ever, a subsequent update or deletion will occur. Some data modelers will argue that NULL should be used for unknown data. While logically, this is true, the performance of the physical model will benefit from this non-null datatype, as will be explained in Chapter 15. Moreover, we do know one thing about this unknown date. For as long as it is unknown, we choose to assume that it has not happened yet. In other words, for as long as it is unknown, we want the DBMS to tell us that it is greater than the value of Now(), whenever we ask that ques- tion. If we used NULL instead of a data value, the DBMS would not give us that answer to that question. If an effective end date is unknown, then, instead of using NULL, we will set it to 12/31/9999, or to whatever value represents the future-most date that the particular DBMS can manage. So the semantics is date unknown (but still in the future), but the implementation is a real date, one in this case that is nearly 8000 years from now. How does a date like that rep- resent semantics like that? First, we assume that the date 12/31/9999 will not be required to represent that far-off New Year’s Eve. So it is available to use for a special purpose. In the case of an effective end date, we often insert or update versions without knowing when the effec- tive time period of the new version may end. So in this case, 12/31/9999 means that the end of the effective time period of the version thus marked is unknown, and that the time period will remain in effect until further notice. For example, when a customer’s address is added to the database, we usually will not know when that customer will move from that address. 60 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH So the status of that address is that it will remain in effect until that customer moves, at some unknown date in the future. In the case of an assertion end date, that date is always set to 12/31/9999 when a row is created. This reflects the assumption that we would never want to assert that a version is true, while also knowing that, at some later point in time, we will change our minds and no longer be willing to make that assertion. While we adopt this assumption in this book, and in the software which is our implementation of these concepts, we note that a more robust semantics might not include this assumption and that, consequently, later releases of our Asserted Versioning Framework may permit non-12/31/9999 assertion end dates to be specified on temporal transactions.6 An effective end date may be changed, i.e. the time period it ends may be shortened or lengthened, as long as the change does not violate temporal integrity constraints, which we will discuss in Part 3. As for an assertion end date, it may be changed for one of four reasons. First, an assertion end date may be changed because we real- ize that the assertion is incorrect, and we have the correction for that error. In that case, the incorrect assertion will cease to be asserted, i.e. it will be withdrawn, as part of the same atomic unit of work that inserts the correcting assertion, and the same date will be used both for the assertion end date of the incorrect assertion and the assertion begin date of the correcting assertion. Second, an assertion end date may be changed because we realize that an assertion is incorrect even though we do not know what the correct data is, or else just because, for some reason, we do not wish to make that assertion any longer. Third, we may conclude that an assertion about the future is no longer action- able, probably because both we and our competitors have more recent and presumably more accurate forecasts. In either case, 6As we briefly allude to, later on, what we assert by means of a row in a table is not so much that the statement made by the row is true. It is, rather, that the statement is actionable. An actionable statement, roughly, is one which is good enough to base a business decision on. Presumably, statements about the past or the present must be true in order to be actionable. But statements about the future, such as financial forecasts, lack a known truth value until the future time about which they are forecasting comes to pass, at which point those statements become either true or false. Such statements may be actionable when first made, and thus be actionable without being true. Moreover, they may be actionable when first made but, over time, become outdated, even prior to becoming true or false. (See, later in this chapter, the section The Very Concept of Bi-Temporality. See also Chapter 13, the section Posted Projections: Past Claims About the Future.) Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 61