Tuesday, August 2, 2016

What's New in SQL 2016 - Temporal Tables

Introduction

Most of us are familiar with CDC tables in SQL Server. CDC or Change Data Capture is used to track and capture the DML changes happening in a table and makes use of log entries for capturing the information. SQL 2016 provides similar kind of functionality for capturing the history of the data related to memory optimized tables. These historical data capturing feature is referred to as Temporal Tables.

Memory  Optimized Tables were introduced in SQLServer 2014 and they provide an efficient way to store and manipulate data using natively compiled procedures, They provide two levels of durability Schema Only and Schema and Data. Schema Only will only make sure schema is preserved in case of a server restart whereas in the latter case both the schema as well as the data is persisted as in the case of a normal table (disk based).
Temporal tables are only supported in the case where durability option is set to schema and data. These tables will be persisted to disk along with data. When we created a temporal table it will preserve the history and will provide data easily based on our point in time analysis.
The temporal tables makes use of a history table internally to track the history of data changes happening in the memory optimized table. The main table will have two datetime2 type fields which are referred to as period columns. This is used by the table to determine the validity of each record to provide an effective point in time analysis. The date fields determine the validity of a record ie period for which record is(was) valid

Illustration

Consider the below case where we have a database to which we are going to add a memory optimized table and make it a temporal table.

For creating a memory optimized table we need to first have a file group in the database that supports memory optimized data. This can be done by using a series of ALTER DATABASE statements as shown below (Assumption is that SQL2016LocalTest is an already existing database in your SQLServer instance)

ALTER DATABASE [SQL2016LocalTest] ADD FILEGROUP mem_opt CONTAINS MEMORY_OPTIMIZED_DATA   
ALTER DATABASE [SQL2016LocalTest] ADD FILE (name='mem_opt1', filename='') TO FILEGROUP mem_opt   
ALTER DATABASE [SQL2016LocalTest] SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON  
GO  


Once the filegroup is setup as above we shall create the memory optimized table on it using CREATE TABLE statement as below

CREATE TABLE dbo.HotelBooking
(
BookingID int,
RoomNo int,
CustomerName varchar(100),
BookingDate datetime,
ValidFrom datetime2  generated always as row start not null default ('19000101'),
ValidTo datetime2    generated always as row end not null default ('99991231'),
PRIMARY KEY CLUSTERED (BookingID),
period for system_time(ValidFrom, ValidTo)
)
 WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.BookingHistory));


Once the above statement is executed we will have a temporal table setup with an internal history table which will used for tracking the history of data from the main table.
You can check this by expanding tables under the corresponding databases node to see the tables as per below



As you see the table will be represented with a small clock icon indicating that it is a temporal table and will also have the history table specified under it. The history table will be an exact replica of the main table in schema and will have clustered index on the key column.
Now lets insert some data to this table. For evaluating the history capture its best if you do the data manipulation operation over a period of time.

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1001,
101,
'Peter',
'20150923'
)

UPDATE dbo.HotelBooking 
SET CustomerName = 'Ramu' 
WHERE RoomNo = 101

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1002,
115,
'John',
'20160314'

)

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1005,
212,
'Mary',
'20160416'

)

UPDATE  dbo.HotelBooking
SET BookingDate = '20160420'
WHERE BookingID = 1005

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1011,
307,
'Khader',
'20160617'

)

UPDATE dbo.HotelBooking 
SET CustomerName = 'Michelle' 
WHERE RoomNo = 101

INSERT dbo.HotelBooking
(
BookingID,
RoomNo,
CustomerName,
BookingDate
)
VALUES
(
1015,
115,
'Mohsin',
'20160622'

)

DELETE FROM dbo.HotelBooking
WHERE BookingID = 1002

Once you've done the above data addition/modification you can check the main and history table and you will get the below data snapshot




I have done the data modification over a period of time (around 2 days) to indicate the effect on history table.
If you analyze the above data you can see how the progression of data is getting captured in the history
Everytime a new insert happens it will be captured in the main table HotelBooking with ValidFrom set to the corresponding system time and ValidTo set to the maxdate value (9999-12-31 23:59:59.99999)
Similarly every update operation is carried out by means of two internal operations a delete of the old entry followed by insertion of new entry. This can be seen from resultset above where you will have an entry in both the tables (see records with BookingID 1001 and 1005 in the above case). The history table will have the entry with the values prior to the modification with ValidFrom as actual ValidFrom value from the main entry and ValidTo as the corresponding system time. The main table entry will have the modified values with ValidFrom as the corresponding system time and ValidTo as the max date value.
In the case of delete operations the record will be removed from the main table and there will be an entry added to the history table with ValidFrom as the original ValidFrom and ValidTo as the corresponding system time.
Now that we have got an idea of what happens on the background and how data will get captured in the two tables lets now see some functions which are associated to Temporal tables and which will help us to query temporal data for time based analysis.

AS OF time

AS OF time function gives the snapshot of temporal table data at an instant of time i.e the data that is valid at that instance

Lets try it on our sample table and illustration is below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME AS OF '2016-07-25 05:15'



Now see the last resultset and compare it with the original table vs history entries and you will notice that there are only 3 entries in the resultset.
If you check the ValidFrom and ValidTo values for the resultset entries its pretty evident that those were the entries which were valid as on specified snapshot time (ie ValidFrom < 2016-07-25 05:15 < ValidTo)
 The ignored records were those which were either expired before snapshot date (ex: 1001 Peter) or the ones which became valid after the snapshot time (ex: 1015,1011)
So effectively what AS OF function does is to apply a query logic as below

SELECT * FROM dbo.HotelBooking WHERE '2016-07-25 05:15' BETWEEN ValidFrom  AND ValidTo
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE '2016-07-25 05:15' BETWEEN ValidFrom  AND ValidTo  

Execute this and you will get the same resultset as returned by AS OF

Graphically this can be depicted using the three scenarios as shown below


FROM time1 TO time2

FROM time1 TO time2 function returns all records whose validity overlaps the period specified by time1 to time2. 
The illustration for this can be shown as below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME  FROM '2016-07-24 10:34' TO '2016-07-25 05:16:55.4510553'



The resultset in this consists of all records with the interval ValidFrom to ValidTo overlapping between interval specified by time1 and time2. One thing to note here is that record with BookingID 1011 was not included as it became valid only from the end boundary value whereas function considers only the cases where interval starts or ends within the boundary value ie the equivalent query can be written as

SELECT * FROM dbo.HotelBooking WHERE   ValidFrom < '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE    ValidFrom < '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'

The corresponding scenarios can be represented graphically as below


BETWEEN time1 AND time2


BETWEEN function works quite similar to the FROM function to get the resultset where record interval of ValidFrom to ValidTo  overlaps with the interval specified by time1 to time2. The only difference between BETWEEN and FROM is that BETWEEN considers the end boundary value (time2) also as a part of the interval unlike the FROM function.
So if you see the illustration for BETWEEN it looks like below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME  BETWEEN '2016-07-24 10:34' AND '2016-07-25 05:16:55.4510553'




Check the result and you can see that its very much similar to resultset for FROM with only addition being the record with BookingID 1011 which starts at the same time as the end time of the interval
So equivalent query in this case can be given as

SELECT * FROM dbo.HotelBooking WHERE   ValidFrom <= '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE    ValidFrom <= '2016-07-25 05:16:55.4510553' AND ValidTo > '2016-07-24 10:34'
ORDER BY ValidFrom

Corresponding graphical representation is shown below



CONTAINED IN (time1,time2)

CONTAINED IN function returns a resultset where the record validity falls within the interval from time1 to time2 ie ValidFrom and ValidTo both lying within the interval

Illustration is given below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME  CONTAINED IN ( '2016-07-24 10:32','2016-07-25 05:18:35.3157322') ORDER BY ValidFrom


The resultset on analysis reveals that only those records whose validity falls fully within the interval time1 to time2 are returned by CONTAINED IN function.

The equivalent query is as below

SELECT * FROM dbo.HotelBooking WHERE ValidFrom >= '2016-07-24 10:32' AND ValidTo <= '2016-07-25 05:18:35.3157322'
UNION ALL
SELECT * FROM dbo.BookingHistory WHERE ValidFrom >= '2016-07-24 10:32' AND ValidTo <= '2016-07-25 05:18:35.3157322' ORDER BY ValidFrom

Graphically this can be represented as below


ALL


The ALL function simply combines the results from both the main and history table in the resultset

Illustration below

SELECT * FROM dbo.HotelBooking FOR SYSTEM_TIME ALL   ORDER BY ValidFrom



The resultset includes all records from both the tables

The equivalent query is as below

SELECT * FROM dbo.HotelBooking 

UNION ALL

SELECT * FROM dbo.BookingHistory  ORDER BY ValidFrom

See the corresponding graphical representation below

Conclusion

The above illustrations gives a clear idea about how temporal functions can be used to generate time based results from a temporal table. 
This shows that Temporal tables provide a much convenient way of capturing history and using it for effective time based analysis
Hope you this article was informative enough. Feel free to revert with any comments you may have.

No comments:

Post a Comment