
Managing time in relational databases- P4
Mô tả tài liệu
Tham khảo tài liệu 'managing time in relational databases- p4', công nghệ thông tin, cơ sở dữ liệu phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả
Tóm tắt nội dung
approach is better at tracking changes to persistent objects and
to relationships other than metric balances.
State Temporal Data: Uni-Temporal
and Bi-Temporal Data
At this point in our discussion, we are concerned with state
data rather than with event data, and with state data that is
queryable rather than state data that needs to be reconstructed.
What then are the various options for managing temporal
queryable state data?
First of all, we need to recognize that there are two kinds of
states to manage. One is the state of the things we are interested
in, the states those things pass through as they change over time.
But there is another kind of state, that being the state of the data
itself. Data, such as rows in tables, can be in one of two states:
correct or incorrect. (As we will see in Chapter 12, it can also
be in a third state, one in which it is neither correct nor incor-
rect.) Version tables and assertion tables record, respectively,
the state of objects and the state of our data about those objects.
Uni-Temporal State Data
In a conventional Customer table, each row represents the cur-
rent state of a customer. Each time the state of a customer
changes, i.e. each time a row is updated, the old data is overwritten
with the new data. By adding one (or sometimes two) date(s) or
timestamp(s) to the primary key of the table, it becomes a uni-
temporal table. But since we already know that there are two dif-
ferent temporal dimensions that can be associated with data, we
know to ask “What kind of uni-temporal table?”
As we saw in the Preface, there are uni-temporal version
tables and uni-temporal assertion tables. Version tables keep
track of changes that happen in the real world, changes to the
objects represented in those tables. Each change is recorded as
a new version of an object. Assertion tables keep track of correct-
ions we have made to data we later discovered to be in error.
Each correction is recorded as a new assertion about the object.
The versions make up a true history of what happened to those
objects. The assertions make up a virtual logfile of corrections
to the data in the table.
Usually, when table-level temporal data is discussed, the
tables turn out to be version tables, not assertion tables. In their
book describing the alternative temporal model [2002, Date,
Darwen, Lorentzos], the authors focus on uni-temporal
versioned data. Bi-temporality is not even alluded to until the
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 41
penultimate chapter, at which point it is suggested that “logged
time history” tables be used to manage the other temporal
dimension. Since bi-temporality receives only a passing mention
in that book, we choose to classify the alternative temporal
model as a uni-temporal model.
In IT best practices for managing temporal data—which we
will discuss in detail in Chapter 4—once again the temporal
tables are version tables, and error correction is an issue that is
mostly left to take care of itself.4 For the most part, it does so
by overwriting incorrect data.5 This is why we classify IT best
practices as uni-temporal models.
The Alternative Temporal Model
What we call the alternative temporal model was developed
by Chris Date, Hugh Darwen and Dr. Nikos Lorentzos in their
book Temporal Data and the Relational Model (Morgan-
Kaufmann, 2002).6 This model is based in large part on tec-
hniques developed by Dr. Lorentzos to manage temporal data
by breaking temporal durations down into temporally atomic
components, applying various transformations to those compo-
nents, and then re-assembling the components back into those
temporal durations—a technique, as the authors note, whose
applicability is not restricted to temporal data.
As we said, except for the penultimate chapter in that book,
the entire book is a discussion of uni-temporal versioned tables.
In that chapter, the authors recommend that if there is a require-
ment to keep track of the assertion time history of a table (which
they call “logged-time history”), it be implemented by means of
an auxiliary table which is maintained by the DBMS.
4Lacking criteria to distinguish the best from the rest, the term “best practices” has
come to mean little more than “standard practices”. What we call “best practices”, and
which we discuss in Chapter 4, are standard practices we have seen used by many of
our clients.
5An even worse solution is to mix up versions and assertions by creating a new row,
with a begin date of Now(), both every time there is a real change, and also every time
there is an error in the data to correct. When that happens, we no longer have a history
of the changes things went through, because we cannot distinguish versions from
corrections. And we no longer have a “virtual logfile” of corrections because we don’t
know how far back the corrections should actually have taken effect.
6The word “model”, as used here and also in the phrases “alternative model” and
“Asserted Versioning model” obviously doesn’t refer to a data model of specific subject
matter. It means something like theory, but with an emphasis on its applicability to
real-world problems. So “the relational model”, as we use the term, for example,
means something like “relational theory as implemented in current relational
technology”.
42 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
In addition, these authors do not attempt, in their book, to
explain how this method of managing temporal data would work
with current relational technology. Like much of the computer
science research on temporal data, they allude to SQL operators
and other constructs that do not yet exist, and so their book is in
large part a recommendation to the standards committees to
adopt the changes to the SQL language which they describe.
Because our own concern is with how to implement temporal
concepts with today’s technologies, and also with how to sup-
port both kinds of uni-temporal data, as well as fully bi-temporal
data, we will have little more to say about the alternative tempo-
ral model in this book.
Best Practices
Over several decades, a best practice has emerged in manag-
ing temporal queryable state data. It is to manage this kind of
data by versioning otherwise conventional tables. The result is
versioned tables which, logically speaking, are tables which com-
bine the history tables and current tables described previously.
Past, present and future states of customers, for example, are
kept in one and the same Customer table. Corrections may or
may not be flagged; but if they are not, it will be impossible to
distinguish versions created because something about a cus-
tomer changed from versions created because past customer
data was entered incorrectly. On the other hand, if they are
flagged, the management and use of these flags will quickly
become difficult and confusing.
There are many variations on the theme of versioning, which
we have grouped into four major categories. We will discuss
them in Chapter 4.
The IT community has always used the term “version” for this
kind of uni-temporal data. And this terminology seems to reflect
an awareness of an important concept that, as we shall see, is cen-
tral to the Asserted Versioning approach to temporal data. For the
term “version” naturally raises the question “Aversion of what?”, to
which our answer is “A version of anything that can persist and
change over time”. This is the concept of a persistent object, and
it is, most fundamentally, what Asserted Versioning is about.
Bi-Temporal State Data
We now come to our second option, which is to manage
both versions and assertions and, most importantly, their
interdependencies. This is bi-temporal data management, the
subject of both Dr. Rick Snodgrass’s book [2000, Snodgrass] and
of our book.
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 43
The Standard Temporal Model
What we call the standard temporal model was developed
by Dr. Rick Snodgrass in his book Developing Time-Oriented
Database Applications in SQL (Morgan-Kaufmann, 2000).
Based on the computer science work current at that time,
and especially on the work Dr. Snodgrass and others had done
on the TSQL (temporal SQL) proposal to the SQL standards
committees, it shows how to implement both uni-temporal
and bi-temporal data management using then-current DBMSs
and then-current SQL.
We emphasize that, as we are writing, Dr. Snodgrass’s book is
a decade old. We use it as our baseline view of computer science
work on bi-temporal data because most of the computer science
literature exists in the form of articles in scientific journals that
are not readily accessible to many IT professionals. We also
emphasize that Dr. Snodgrass did not write that book as a com-
pendium of computer science research for an IT audience.
Instead, he wrote it as a description of how some of that research
could be adapted to provide a means of managing bi-temporal
data with the SQL and the DBMSs available at that time.
One of the greatest strengths of the standard model is that it
discusses and illustrates both the maintenance and the querying
of temporal data at the level of SQL statements. For example, it
shows us the kind of code that is needed to apply the temporal
analogues of entity integrity and referential integrity to temporal
data. And for any readers who might think that temporal data
management is just a small step beyond the versioning they
are already familiar with, many of the constraint-checking
SQL statements shown in Dr. Snodgrass’s book should suffice
to disabuse them of that notion.
The Asserted Versioning Temporal Model
What we call the Asserted Versioning temporal model is our
own approach to managing temporal data. Like the standard
model, it attempts to manage temporal data with current tech-
nology and current SQL.
The Asserted Versioning model of uni-temporal and bi-tem-
poral data management supports all of the functionality of the
standard model. In addition, it extends the standard model’s
notion of transaction time by permitting data to be physically
added to a table prior to the time when that data will appear
in the table as production data, available for use. This is done
by means of deferred transactions, which result in deferred
assertions, those being the inserted, updated or logically deleted
44 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
rows resulting from those transactions.7 Deferred assertions,
although physically co-located in the same tables as other data,
will not be immediately available to normal queries. But once
time in the real world reaches the beginning of their assertion
periods, they will, by that very fact, become currently asserted
data, part of the production data that makes up the database
as it is perceived by its users.
We emphasize that deferred assertions are not the same thing
as rows describing what things will be like at some time in the
future. Those latter rows are current claims about what things
will be like in the future. They are ontologically post-dated.
Deferred assertions are rows describing what things were, are,
or will be like, but rows which we are not yet willing to claim
make true statements. They are epistemologically post-dated.
Another way that Asserted Versioning differs from the stan-
dard temporal model is in the encapsulation and simplification
of integrity constraints. The encapsulation of integrity con-
straints is made possible by distinguishing temporal transactions
from physical transactions. Temporal transactions are the ones
that users write. The corresponding physical transactions are
what the DBMS applies to asserted version tables. The Asserted
Versioning Framework (AVF) uses an API to accept temporal
transactions. Once it validates them, the AVF translates each
temporal transaction into one or more physical transactions.
By means of triggers generated from a combination of a logical
data model together with supplementary metadata, the AVF
enforces temporal semantic constraints as it submits physical
transactions to the DBMS.
The simplification of these integrity constraints is made possi-
ble by introducing the concept of an episode. With non-temporal
tables, a row representing an object can be inserted into that table
at some point in time, and later deleted from the table. After it is
deleted, of course, that table no longer contains the information
that the row was ever present. Corresponding to the period of
time during which that row existed in that non-temporal table,
there would be an episode in an asserted version table, consisting
of one or more temporally contiguous rows for the same object.
So an episode of an object in an asserted version table is in effect
during exactly the period of time that a row for that object would
exist in a non-temporal table. And just as a deletion in a conven-
tional table can sometime later be followed by the insertion of a
new row with the same primary key, the termination of an
7The term “deferred transaction” was suggested by Dr. Snodgrass during a series of
email exchanges which the authors had with him in the summer of 2008.
Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS 45
episode in an assertion version table can sometime later be
followed by the insertion of a new episode for the same object.
In a non-temporal table, each row must conform to entity
integrity and referential integrity constraints. In an asserted ver-
sion table, each version must conform to temporal entity integ-
rity and temporal referential integrity constraints. As we will
see, the parallels are in more than name only. Temporal entity
integrity really is entity integrity applied to temporal data. Tem-
poral referential integrity really is referential integrity applied to
temporal data.
Glossary References
Glossary entries whose definitions form strong inter-
dependencies are grouped together in the following list. The
same glossary entries may be grouped together in different ways
at the end of different chapters, each grouping reflecting the
semantic perspective of each chapter. There will usually be sev-
eral other, and often many other, glossary entries that are not
included in the list, and we recommend that the Glossary be
consulted whenever an unfamiliar term is encountered.
as-is
as-was
Asserted Versioning
Asserted Versioning Framework (AVF)
episode
persistent object
state
thing
physical transaction
temporal transaction
temporal entity integrity (TEI)
temporal referential integrity (TRI)
the alternative temporal model
the Asserted Versioning temporal model
the standard temporal model
46 Chapter 2 A TAXONOMY OF BI-TEMPORAL DATA MANAGEMENT METHODS
PART
2
AN INTRODUCTION TO
ASSERTED VERSIONING
Chapter Contents
3. The Origins of Asserted Versioning: Computer Science Research 51
4. The Origins of Asserted Versioning: The Best Practices 75
5. The Core Concepts of Asserted Versioning 95
6. Diagrams and Other Notations 119
7. The Basic Scenario 141
Part 1 provided the context for Asserted Versioning, a history
and a taxonomy of various ways in which temporal data has
been managed over the last several decades. Here in Part 2, we
introduce Asserted Versioning itself and prepare the way for
the detailed discussion in Part 3 of how Asserted Versioning
actually works.
In Chapter 3, we discuss the origins of Asserted Versioning in
computer science research. Based on the work of computer
scientists, we introduce the concepts of a clock tick and an
atomic clock tick, the latter of which, in their terminology, is
called a chronon. We go on to discuss the various ways in which
time periods are represented by pairs of dates or of timestamps,
since SQL does not directly support the concept of a time period.
There are only a finite number of ways that two time periods
can be situated, with respect to one another, along a common
Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00024-8
Copyright # 2010 Elsevier Inc. All rights of reproduction in any form reserved. 47
timeline. For example, one time period may entirely precede or
entirely follow another, they may partially overlap or be identi-
cal, they may start at different times but end at the same time,
and so on. These different relationships among pairs of time per-
iods have been identified and catalogued, and are called the
Allen relationships. They will play an important role in our
discussions of Asserted Versioning because there are various
ways in which we will want to compare time periods. With the
Allen relationships as a completeness check, we can make sure
that we have considered all the possibilities.
Another important section of this chapter discusses the dif-
ference between the computer science notion of transaction
time, and our own notion of assertion time. This difference is
based on our development of the concepts of deferred trans-
actions and deferred assertions, and for their subsumption under
the more general concept of a pipeline dataset.
In Chapter 4, we discuss the origins of Asserted Versioning in
IT best practices, specifically those related to versioning. We
believe that these practices are variations on four basic methods
of versioning data. In this chapter, we present each of these
methods by means of examples which include sample tables
and a running commentary on how inserts, updates and deletes
affect the data in those tables.
In Chapter 5, we present the conceptual foundations of
Asserted Versioning. The core concepts of objects, episodes, vers-
ions and assertions are defined, a discussion which leads us to
the fundamental statement of Asserted Versioning, that every
row in an asserted version table is the assertion of a version of
an episode of an object. We continue on to discuss how time
periods are represented in asserted version tables, how temporal
entity integrity and temporal referential integrity enforce the core
semantics of Asserted Versioning, and finally how Asserted
Versioning internalizes the complexities of temporal data
management.
In Chapter 6, we introduce the schema common to all
asserted version tables, as well as various diagrams and notations
that will be used in the rest of the book. We also introduce the
topic of how Asserted Versioning supports the dynamic views
that hide the complexities of that schema from query authors
who would otherwise likely be confused by that complexity.
When an object is represented by a row in a non-temporal
table, the sequence of events begins with the insertion of that
row, continues with zero or more updates, and either continues
on with no further activity, or ends when the row is eventually
deleted. When an object is represented in an asserted version
48 Part 2 AN INTRODUCTION TO ASSERTED VERSIONING
table, the result includes one row corresponding to the insert in
the non-temporal table, additional rows corresponding to the
updates to the original row in the non-temporal table, and an
additional row if a delete eventually takes place. This sequence
of events constitutes what we call the basic scenario of activity
against both conventional and asserted version tables. In Chap-
ter 7, we describe how the basic scenario works when the target
of that activity is an asserted version table.
Glossary References
Glossary entries whose definitions form strong inter-
dependencies are grouped together in the following list. The
same Glossary entries may be grouped together in different ways
at the end of different chapters, each grouping reflecting the
semantic perspective of each chapter. There will usually be sev-
eral other, and often many other, Glossary entries that are not
included in the list, and we recommend that the Glossary be
consulted whenever an unfamiliar term is encountered.
Allen relationships
time period
assertion
version
episode
object
assertion time
transaction time
atomic clock tick
chronon
clock tick
deferred assertion
deferred transaction
pipeline dataset
temporal entity integrity
temporal referential integrity
Part 2 AN INTRODUCTION TO ASSERTED VERSIONING 49
3
THE ORIGINS OF ASSERTED
VERSIONING: COMPUTER
SCIENCE RESEARCH
CONTENTS
The Roots of Asserted Versioning 51
Computer Science Research 54
Clocks and Clock Ticks 55
Time Periods and Date Pairs 56
The Very Concept of Bi-Temporality 63
Allen Relationships 65
Advanced Indexing Strategies 68
Temporal Extensions to SQL 69
Glossary References 72
We begin this chapter with an overview of the three sources of
Asserted Versioning: computer science work on temporal data;
best practices in the IT profession related to versioning; and
original work by the authors themselves. We then spend the rest
of this chapter discussing computer science contributions to
temporal data management, and the relevance of some of these
concepts to Asserted Versioning.
The Roots of Asserted Versioning
Over the last three decades, the computer science community
has done extensive work on temporal data, and especially on bi-
temporal data. During that same period of time, the IT commu-
nity has developed various forms of versioning, all of which are
methods of managing one of the two kinds of uni-temporal data.
Asserted Versioning may be thought of as a method of manag-
ing both uni- and bi-temporal data which, unlike the standard
model of temporal data management, recognizes that rows
in bi-temporal tables represent versions of things and that,
Managing Time in Relational Databases. Doi: 10.1016/B978-0-12-375041-9.00003-0
Copyright # 2010 Elsevier Inc. All rights of reproduction in any form reserved. 51
consequently, these rows do not stand alone as semantic objects.
Versions that are versions of the same thing are related to one
another by that very fact. Versions that are versions of the same
thing, and also that together represent an unbroken period of time
in the life of that thing, are even more closely related to one
another. Groups of temporally contiguous versions of the same
thing are semantic objects that must bemanaged as single tempo-
ral objects, even though they may consist of any number of physi-
cal rows. These single semantic objects are what we call episodes.
Asserted Versioning may also be thought of as a form of
versioning, a technique for managing historical data that has
evolved in the IT industry over the last quarter-century. But unlike
existing best practice variations on that theme, AssertedVersioning
supports the full semantics of versions, i.e. everything that it is con-
ceptually possible to do with versions. In addition, Asserted
Versioning also integrates the management of versions with the
management of assertions andwith themanagement of bi-tempo-
ral data. As we pointed out earlier, it directly manages bi-temporal
physical tables, andmanages both forms of uni-temporal tables, as
well as conventional tables, as views over those bi-temporal tables.
Besides embracing contributions from computer science
research and from business IT best practices, we believe that
Asserted Versioning introduces three new concepts to the field
of temporal data management. The first concept is that of an
episode. In making episodes central to its management of tem-
poral data, Asserted Versioning breaks with a basic component
of the relational model, which is that integrity constraints apply
only to individual rows. For example, referential integrity, in the
relational model, requires that every foreign key in one row
expresses a valid relationship which that row has to one other
row. In Asserted Versioning, a temporalized foreign key expresses
a relationship which one row, called a version, has to an episode,
not to any single row which is part of that episode.
The second concept is that of the internalization of pipeline
datasets. We define a pipeline dataset as any collection of busi-
ness data that is not a production table, but that contains data
whose destination or origin is such a table.1 Pipeline datasets
1The term “production” indicates that these tables contain “real” data. Regularly
scheduled processes are being carried out to maintain these tables, and to keep their
contents as accurate, secure and current as possible. Regularly scheduled processes, as
well as non-scheduled ones, are being carried out to access this data to obtain needed
information. So production tables are the tables that the business tries to keep
accurate, current and secure, and from which it draws the information it needs to
carry out its mission and meet its objectives.
52 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
which contains data destined for production tables are inflow
pipeline datasets. Pipeline datasets which contain data derived
from production tables are outflow pipeline datasets.
History tables are one example of a pipeline dataset. Sets of
transactions, accumulated in files or tables and waiting to be
applied to their target tables, are another example. While the
use of versions eliminates history tables by internalizing them
within the tables whose history they track, the use of deferred
transactions and deferred assertions eliminates batch files of
transactions waiting to be applied to a database by also inter-
nalizing them within their target tables. In this book, we will
show how the use of these internalized managed objects reduces
the costs of maintaining databases by replacing external files or
tables such as history tables, transaction files and logfiles, with
structures internalized within production tables rather than
being external to them.
The third concept is that of encapsulation, as it applies to the
management of temporal data. Asserted Versioning fully
encapsulates the complexities of maintaining temporal tables
by distinguishing between temporal transactions and physical
transactions. Temporal transactions are inserts, updates and
deletes, as written by their authors, whose targets are asserted
version tables. They are submitted to the Asserted Versioning
Framework (AVF), not directly to the DBMS. The AVF translates
them into physical insert and update transactions which it sub-
mits to the DBMS. These physical transactions implement the
intentions expressed in those temporal transactions, and we
note that, except for temporal insert transactions, one temporal
transaction will always be translated into multiple physical
transactions. In Part 3, we discuss several temporal transactions,
and show both the physical transactions the AVF creates to
implement these temporal transactions, and also how the AVF
knows how to map between the two.
The practical orientation of Asserted Versioning is manifest
in its encapsulation of the complexities of temporal data
structures and the processes that manage them. Asserted
Versioning is an integrated method of managing temporal data
which relieves data modelers of the burden of designing and
maintaining data models that must explicitly define temporal
data structures and integrity constraints on them. It also
relieves developers of the burden of designing, writing and
maintaining code that enforces the rules which provide the
semantics of temporal data. And it relieves query authors,
whether developers or end users, of the burden of writing com-
plex queries that must explicitly check for temporal gaps or
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 53
overlaps, along one or two temporal dimensions, among a set
of rows accessed by the query.
The origins of Asserted Versioning in computer science
research, in IT best practices, and in contributions from the
authors, are illustrated in Figure 3.1. Although deferred trans-
actions and deferred assertions are the mechanisms for the
internalization of several pipeline datasets, they are shown as
a separate item in Figure 3.1 because of their particular
importance.
Computer Science Research
In 2000, Dr. Rick Snodgrass stated that academics had publi-
shed, at that time, over 1500 articles on temporal data manage-
ment [2000, Snodgrass, xviii]. And over the last decade, much
additional work has been done. For our purposes, this work falls
into two categories: (i) research relevant to our task of designing
and building a method of temporal data management that works
with today’s technology; and (ii) research on future directions
for the implementation of temporal functions in commercial
DBMSs, and for the specification of these functions in
extensions to the current SQL standard.
Research on bi-temporality within the
computer science community
The development of versioning within
the IT community
Asserted
Versioning
Episodes as managed objects
The internalization of pipeline
datasetsThe
authors’
own
work
Deferred transactions and deferred
assertions
Full encapsulation of transactions and
of temporal integrity constraints
Figure 3.1 The Origins of Asserted Versioning.
54 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
Clocks and Clock Ticks
An atomic clock tick is the smallest interval of time recognized
by the DBMS that can elapse between any two physical
modifications to a database.2 We note that the standard computer
science term for an atomic clock tick is a chronon. A clock tick is
an interval of time defined on the basis of atomic clock ticks,
and that is used in an Asserted Versioning database to delimit
the two time periods of rows in asserted version tables, and also
to indicate several important points in time. In asserted version
tables, clock ticks are used for effective time begin and end dates
and for episode begin dates; and atomic clock ticks are used for
assertion time begin and end dates, and for row create dates.
Clock ticks may be defined at any level of granularity
supported by the underlying DBMS. A timestamp is a typical
clock tick. A date represents a daily clock tick. A 1-month clock
tick represents a situation in which a database is updated from
a collection of transactions at most once a month.
When time periods measured in clock ticks of different
granularities are compared, the technical issues involved in the
comparison are complex, even if the conceptual issues are not.
In our own implementation of Asserted Versioning, we finesse
these issues by requiring that all asserted version tables in the
same database use clock ticks of the same granularity. In this
book, we assume, albeit somewhat unrealistically, that all
asserted version tables use a clock that ticks once amonth (except
where otherwise noted), and we will use the first day of each
month to represent those clock ticks. This means that changes
to these asserted version tables happen on the first of each month
and, by the same token, that the state of the database after each
monthly update will remain unaltered for the rest of the month.
But no matter what granularity we choose for our clock ticks,
there is the issue of how clock ticks of one level of granularity are
mapped onto those of another level. For example, if we choose a
clock tick of one day, when does that day start? Does June 1st,
2012 begin at 2012-06-01-12:00:00.000, or perhaps at 2012-06-
01-12:00:00.001, or perhaps even at 2012-05-31-11:59:59.999?
The simplest solution is to let the DBMS determine the mapping
for us; and in most cases this is perfectly adequate. But IT data
management professionals should at least be aware that issues
like these do exist.
2We are not referring here to the cesium-based atomic clock on which standard time is
based. An atomic clock tick, in the sense being defined here, is a logical concept, not a
physical one.
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 55
They should also be aware that another issue exists, that of
how SQL timestamps map to when things happen in the real
world. SQL uses Universal Coordinated Time (UTC), which is
based on cesium clocks, which might lead us to conclude that
SQL timestamps are extremely accurate. Precise they may be;
but issues of accuracy involved in their use do exist.
For example, suppose we know that an astronomical event
which has just happened will happen again in exactly 725 days,
5 hours and 23 seconds. If we know the SQL timestamp when
the event occurred, can we calculate exactly what the SQL
timestamp will be when the event happens again? We can, of
course, calculate the number of seconds that will have elapsed
between the two events, that number being 62,658,023.
But it is a mistake to conclude that from knowing the
timestamp of the first event and also knowing the number of
seconds until the second event, we can calculate the timestamp
of the second event. In fact, we cannot reliably do that. The rea-
son is that from the moment the first timestamp is current to the
moment that the second timestamp is current, one or more leap
seconds may have been added to or subtracted from the count
of cesium clock ticks between those two timestamps, and for
the same reason that a leap day is added to the Gregorian calen-
dar once every four years. But unlike the leap year adjustment,
which is regular enough to count on, we do not know, in
advance, how many leap seconds might be added or subtracted
between now and the next occurrence of that astronomical
event, or when those adjustments may occur.
The unpredictable adjustment of UTC and its SQL
timestamps by means of leap seconds is seldom an issue in busi-
ness IT. The reason is that for business purposes, we all tag our
data with SQL timestamps, and we all regard those timestamps
as an accurate record of when business events happen in the real
world. In addition, for most business purposes, we assume that a
SQL timestamp plus a defined interval of time will result in a
second timestamp that represents when some second event will
occur.3
Time Periods and Date Pairs
SQL does not recognize a period of time as a managed object.
Instead, we have to use a pair of dates. There are four ways we
can use a pair of dates to do this. Either the beginning date, or
3A more detailed discussion of how SQL timestamps relate to real-world events is
contained in Chapter 3 of Dr. Snodgrass’s book.
56 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
the ending date, or both, may or may not be included in the time
period they delimit. If a date is not included, the time period is
said to be open on that end; otherwise it is said to be closed.
A discussion of all four ways of representing time periods with
begin and end dates can be found in the book presenting the
standard temporal model [2000, Snodgrass] and also in the book
presenting the alternative temporal model [2002, Date, Darwen,
Lorentzos]. Here, we discuss just two of the four combinations:
closed-open and closed-closed. This is because the closed-open
representation is actually the best way of representing time per-
iods, whereas the closed-closed representation appears to be the
best way of doing that.
Consider two versions of the same object, V1 and V2. In both
cases, in this example, the first time period for the object starts
on 2/19/2011 and ends on 5/22/2011, and the second starts on
5/23/2011 and ends on 10/14/2012.4 Consequently there is no
gap in time between these two time periods. 5/22/2011 and
5/23/2011 are contiguous clock ticks.
How are we to represent this? Two of the possibilities are
shown in Figures 3.2 and 3.3.
Given that the first time period starts on 2/19/2011 and ends on
5/22/2011, and that the second time period starts on 5/23/2011
and ends on 10/14/2012, it might seem that the closed-closed rep-
resentation is obviously the correct one, and that the closed-open
representation is obviously wrong. But that is not the case.
If wewanted to argue in support of the closed-closed represen-
tation, we could try to show that the closed-open representation
V1
D1
2/19/2011
5/23/2011 10/15/2012
5/23/2011
D2
V2
Figure 3.2 A Closed-Open Representation of Two Time Periods.
D1
2/19/2011 5/22/2011
10/14/20125/23/2011
V1
V2
D2
Figure 3.3 A Closed-Closed Representation of Two Time Periods.
4In this section, we will assume that our clock ticks once a day.
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 57
should not be used by querying for the time period that contains
the date 5/23/2011. On the closed-open representation, it might
seem that we can’t tell to which version 5/23/2011 belongs. But
in fact, we can. We just need the following WHERE clause predi-
cate in our SQL query:
WHERE D1 <¼ ‘05/23/2011’ AND ‘05/23/2011’ < D2
With this predicate, the query will correctly pick out V2 from
Figure 3.2.
So one reason we might have thought that the closed-closed
representation is right is that its begin and end dates are the same
dates we used to set up the example when we said that one period
begins and ends on one set of dates and the other period begins
and ends on the other set of dates. Another reason we might have
thought that the closed-closed representation is right is that we
are looking for a pair of dates that a third date is between.
Between, in one sense, is on display in the statement “Pick a
number between 1 and 10”. We all know that 1 and 10 are both
numbers that we can pick. And the SQL BETWEEN operator
corresponds to this sense of the word. So if we use the closed-
closed representation, we can write:
WHERE ‘05/23/2011’ BETWEEN D1 AND D2
In other words, when a closed-closed representation is used,
we can rely on SQL’s BETWEEN to express what we ordinarily
mean by “between”, which is what we might call the inclusive
sense of “between”.
But there is another sense of between, which is on display in
the statement “The abandoned car is somewhere between
mileposts 6 and 10, along I-65N, heading out of Pensacola”. We
all know that to find the car, we should start at milepost 6 and
continue up to milepost 10. In particular, we know that we don’t
need to search past milepost 10, i.e. past the start of the tenth
mile heading out of Pensacola.
This is the sense of between used in the closed-open conven-
tion. The closest English equivalent would be “from . . . . . up to”,
in the sense of “from” and then “up to but not including”. But
since each SQL predicate returns the correct result, provided
each is used with its corresponding method of representing
periods of time, each method is correct.
What, then, is the advantage of using the closed-open repre-
sentation? Well, look again at the two ways of representing V1
and V2. In both cases, V1 and V2 are contiguous. We know this
because we have set up the example that way. And with the
closed-open representation, we can immediately see that there
58 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
is no gap between V1 and V2. But with the closed-closed repre-
sentation, we don’t know whether there is a gap or not. To make
that determination, we also need to know at what rate the clock
ticks. Assuming for this particular example that the clock ticks
once a day, then only when given this additional information
can we determine, from a closed-closed representation, that
V1 and V2 have no gap between them.
This difficulty of determining whether or not two consecutive
versions of the same object are contiguous is not just a problem
for the human beings trying to understand the data. It is also a
problem for the code which will often be called upon to deter-
mine, of two consecutive versions, whether or not they are
contiguous, i.e. whether or not there is a clock tick between
them. With a closed-open representation, the code only has to
compare two values to see whether or not they are EQUAL.
Making matters worse, the granularity of DBMS timestamp
datatypes has changed over time. Originally, these timestamps
were seconds. Later, they were expressed as milliseconds, then
partial microseconds, then microseconds, and now we are
almost at the point where timestamps will be expressed in
nanoseconds. With this sliding scale, across DBMSs and DBMS
upgrades, we might not even know the granularity at which to
try to determine if two versions are or are not contiguous. With
a closed-open representation, the code to make this determina-
tion is trivial. With a closed-open representation, it may actually
be impossible.
Once the SQL standards groups can agree on temporal
extensions to the SQL standard, part of that agreement will cer-
tainly be a way of representing time periods directly, without rely-
ing on the confusing circumlocutions of various ways of using
pairs of dates to represent time periods. But our concern, in this
book, is with today’s SQL, and so we must choose a date-pair
method of representation.5 Therefore, from this point forward,
we will use the closed-open representation for time periods.
9999 and 12/31/9999
We will use the notation 9999 in our illustrations, to represent
the latest date that the DBMS can recognize and manipulate.
5Support for a PERIOD datatype has been introduced by such vendors as Oracle and
Teradata. But what that support means may differ from vendor to vendor. Can a
unique index be defined on a PERIOD datatype? Which of the Allen relationship
comparisons are also supported? So, lacking a standard for the PERIOD datatype, we
will continue the practice of defining periods of time in terms of their begin and end
points in time.
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 59
That latest date cannot be used in effective or assertion begin
dates. If it is used as business data, then it has whatever meaning
its users assign to it, which will probably be as the designation of
a day which is still a long way off. But if it is used in effective or
assertion end dates, it is treated as a date by the DBMS but is not
interpreted as a date by the users of the system. Instead, to those
users, it means “later than now”. Equivalently, we can say that it
means “end of the time period not known, but assumed to not
have happened yet”. We will generally use the phrase until
further notice to represent this semantics—in effect until further
notice, or asserted as true until further notice.
When we specify an insert or an update against a conven-
tional table, we normally do not know when, if ever, the target
row will be next updated or deleted. By the same token, when
we specify a temporal insert or a temporal update against a
bi-temporal table, we normally do not know when, if ever, the
effective time period of the new version will end, because we don’t
know when, if ever, a subsequent update or deletion will occur.
Some data modelers will argue that NULL should be used for
unknown data. While logically, this is true, the performance of
the physical model will benefit from this non-null datatype, as
will be explained in Chapter 15. Moreover, we do know one thing
about this unknown date. For as long as it is unknown, we
choose to assume that it has not happened yet. In other words,
for as long as it is unknown, we want the DBMS to tell us that
it is greater than the value of Now(), whenever we ask that ques-
tion. If we used NULL instead of a data value, the DBMS would
not give us that answer to that question.
If an effective end date is unknown, then, instead of using
NULL, we will set it to 12/31/9999, or to whatever value
represents the future-most date that the particular DBMS can
manage. So the semantics is date unknown (but still in the
future), but the implementation is a real date, one in this case
that is nearly 8000 years from now. How does a date like that rep-
resent semantics like that?
First, we assume that the date 12/31/9999 will not be required
to represent that far-off New Year’s Eve. So it is available to use
for a special purpose. In the case of an effective end date, we
often insert or update versions without knowing when the effec-
tive time period of the new version may end. So in this case,
12/31/9999 means that the end of the effective time period of
the version thus marked is unknown, and that the time period
will remain in effect until further notice. For example, when a
customer’s address is added to the database, we usually will
not know when that customer will move from that address.
60 Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH
So the status of that address is that it will remain in effect until
that customer moves, at some unknown date in the future.
In the case of an assertion end date, that date is always set to
12/31/9999 when a row is created. This reflects the assumption
that we would never want to assert that a version is true, while
also knowing that, at some later point in time, we will change
our minds and no longer be willing to make that assertion. While
we adopt this assumption in this book, and in the software
which is our implementation of these concepts, we note that a
more robust semantics might not include this assumption and
that, consequently, later releases of our Asserted Versioning
Framework may permit non-12/31/9999 assertion end dates to
be specified on temporal transactions.6
An effective end date may be changed, i.e. the time period it
ends may be shortened or lengthened, as long as the change
does not violate temporal integrity constraints, which we will
discuss in Part 3. As for an assertion end date, it may be changed
for one of four reasons.
First, an assertion end date may be changed because we real-
ize that the assertion is incorrect, and we have the correction for
that error. In that case, the incorrect assertion will cease to be
asserted, i.e. it will be withdrawn, as part of the same atomic
unit of work that inserts the correcting assertion, and the same
date will be used both for the assertion end date of the incorrect
assertion and the assertion begin date of the correcting
assertion.
Second, an assertion end date may be changed because we
realize that an assertion is incorrect even though we do not know
what the correct data is, or else just because, for some reason, we
do not wish to make that assertion any longer. Third, we may
conclude that an assertion about the future is no longer action-
able, probably because both we and our competitors have more
recent and presumably more accurate forecasts. In either case,
6As we briefly allude to, later on, what we assert by means of a row in a table is not so
much that the statement made by the row is true. It is, rather, that the statement is
actionable. An actionable statement, roughly, is one which is good enough to base a
business decision on. Presumably, statements about the past or the present must be
true in order to be actionable. But statements about the future, such as financial
forecasts, lack a known truth value until the future time about which they are
forecasting comes to pass, at which point those statements become either true or false.
Such statements may be actionable when first made, and thus be actionable without
being true. Moreover, they may be actionable when first made but, over time, become
outdated, even prior to becoming true or false. (See, later in this chapter, the section
The Very Concept of Bi-Temporality. See also Chapter 13, the section Posted
Projections: Past Claims About the Future.)
Chapter 3 THE ORIGINS OF ASSERTED VERSIONING: COMPUTER SCIENCE RESEARCH 61