Wednesday, November 8, 2017

SQL Tips: Demystifying FORMAT behavior in Datetime vs Time Values

Introduction


Since SQL 2012 , one of the functions which I used quite regularly on my day to day coding has been the FORMAT function.
The major use case has been in formatting date and time based values to custom date and time formats. Though I rely upon front end to do the formatting on most occasions, sometimes we're required to do it at database layer as some tasks would be data feeds like Excel exports where formatted data has to be exported to Excel sheets.
This article clarifies the difference in the behavior of FORMAT function that you shall notice while working with datetime vs time based values

Scenario

Now lets set the scene for the topic on hand

Consider this small code snippet where I'm trying to format a standard datetime value to the format dd MMM yyyy hh:mm:ss AM/PM.

declare @dt datetime = '20171025 13:14:45'

SELECT FORMAT(@dt,'dd MMM yyyy hh:mm:ss tt')

The result is shown below


All is as expected and we are good

Now lets try using FORMAT function on a time value

declare @tm time = '14:34:56'

SELECT FORMAT(@tm,'hh:mm:ss tt')

And the result is as below


This was something totally unexpected. The expectation was to get the time based value like 02:34:56 PM 

Explanation

I decided to do some analysis on FORMAT function to get to the bottom of this problem.

The official documentation of FORMAT function also clearly states supporting date, datetime, datetime2, time datatypes as seen from here


But the catch is in the mapping table

If you check the documentation, time datatype alone is mapped to .Net datatype of TimeSpan while date, datetime, datetime2 are all mapped to DateTime datatype in .Net

The TimeSpan datatype accept custom specifiers as aid out by the article below


On contrast, the acceptable specifiers for DateTime are the below


Reading through the link you can note the below points regarding Format Specifiers

  • TimeSpan doesnt accept : as a valid separator whereas DateTime accepts : as a valid time separator
  • TimeSpan requires using \ as escape character to escape the : seperators
  • TimeSpan accepts only h or hh as specifier for hour part always returned in 24hr clock whereas DateTime accepts both lower and upper case (h,hh,H,HH) with lowercase giving time in 12h and uppercase in 24hr clock
  • DateTime accepts t or tt to return first or full part of AM/PM designator but TimeSpan doesnt support this as it always returns time in 24hr clock 
Based on the above comparison we can see that for getting time value in hh:mm:ss format we need to use FORMAT function like this

declare @tm time = '02:34:56'

SELECT FORMAT(@tm,'hh\:mm\:ss')

And result is as shown



If you want to get format as 02:34:56 PM  you need to cast it to datetime first before applying FORMAT function as per the stated points above
i.e like


declare @tm1 time = '02:34:56'

SELECT FORMAT(CAST(@tm1 as datetime),'hh:mm:ss tt')


And you shall get the intended result as shown


Conclusion

As seen from the illustration above, FORMAT function handles the datetime and time fields differently due to the difference in mapped .Net datatypes.

Tuesday, October 31, 2017

SQL Tips: Exploring Agent_Datetime Function

Introduction


Recently there was a discussion in one of the forums which helped me to learn about a function which has been there for a long time  in SQLServer and had eluded my knowledge till now.
This article explains on the details of the function along with the common use cases.

Agent_Datetime 

The function of interest is agent_datetime.
It resides in the msdb database of a SQLServer instance and accepts two arguments. First argument would be an eight digit integer which represents date portion of a datetime value. The second argument would be a six digit integer representing the time part of the datetime value in hours,minutes and seconds format.

The function can be invoked as below

Select msdb.dbo.agent_datetime(20171027,231134)

The result would be as below



As you see it created a datetime value 2017-10-27 23:11:34.000 from the two integer values passed.

Background

There is a clear intention behind naming the function as Agent_Datetime and placing it within the msdb database.
The function is not officially documented and is basically used internally for converting the date and time values related to sql jobs from msdb system tables.

Use Cases

1. Get SQL Agent related datetime values from integer based columns in msdb catalog objects

There are few columns like for ex. rundate and runtime in sysjobhistory table which represented a date value stored as integer internally.
So query like below



SELECT j.name,dbo.agent_datetime(jh.run_date,jh.run_time) AS RunDate
FROM dbo.sysjobs j
JOIN dbo.sysjobhistory jh
ON jh.job_id = j.job_id

Will give you the list of jobs with their run dates in proper datetime format from msdb database.

2. Generic use to generate datetime values from integer based date and time fields

Though the function is created in msdb we're free to use it in our user databases to merge integer based date and time values to a proper datetime value.

As an illustration see the below


The above screenshot shows three different illustrations regarding use of agent_datetime function with different types of arguments.
The first example passes 8 digit date and 6 digit time based integer values and merges them into a proper datetime value.
The second examples shows what happens when you pass a 4 digit integer value for the time part instead of the expected 6 digits. The function will interpret it in the format mm:ss and returns merged date with the minute and seconds value. If you have only hour and minute parts make sure you pass it in hhmm00 format (i.e adding 0s for the missing seconds)
The final illustration depicts the case where you've only a date part to pass. In this case, you need to pass a default value of '' for the time part when you invoke the agent_datetime function. The function will return full datetime value with time part being default (00:00:00)


Conclusion

As seen from the illustrations above the msdb.dbo.agent_datetime scalar function can be used to generate proper datetime values from two separate integer based date and time values. It can be used for sql agent based catalog queries in msdb as well as for the queries in other databases. 

The only caveat is that since its undocumented its better to refrain from using this on production code especially for the cases where we want the query to be run periodically. 
For adhoc querying we shall use the function as a quick way to generate datetime values from integer date and time equivalents.



Tuesday, October 24, 2017

SQL 2017 Tips: Tweaking IDENTITY_CACHE Setting to Avoid IDENTITY Column Gaps

Introduction

In my discussions with my fellow colleagues as well as on my online interactions with the developer community, I've come across a good amount of cases where people have raised a concern over the IDENTITY behavior in SQLServer.
As you probably know, IDENTITY based columns have a tendency to jump values whenever server restart occurs in the recent versions. Though this doesn't cause any problems regarding the sequence of values being generated, there are some use cases where I've seen this causing an issue for the application especially when the column is being expected to maintain the continuity in values and avoid gaps. Though its certainly a debatable topic on whether

Root Cause

In reality, the cause of this issue is purely intentional and by design.

The IDENTITY values are cached for improving the performance of insert operations on tables with IDENTITY columns. So if we had to dispense with this IDENTITY behavior our only option was to utilize the trace flag 272. This had to be set either using DBCC commands or using -T startup option.

SCOPED CONFIGURATION 


SQL 2016 introduced SCOPED CONFIGURATIONs which allows configuration settings to be applied at a database instance level.
In SQL 2017 there was a new option added to it namely IDENTITY_CACHE which can be used for solving the above issue. Once set for a database, this setting ensures the identity values are not getting cached within the database. This will avoid the problem of IDENTITY column jumping values during the restart of a SQL Server.

Syntax


The scoped configuration setting for IDENTITY_CACHE can be enabled by using a statement as below

ALTER DATABASE SCOPED CONFIGURATION SET IDENTITY_CACHE=OFF ;

Once above statement is executed the effect would be same as setting the trace flag 272 but at the database level.
This provides an added advantage to us due to the fact that we shall set it at the database level only for the databases where handling gaps in Sequence/Identity columns is a business problem and has to be avoided.
Wherever we need an optimal insert performance and cares little on the gaps in the values, we shall still keep the setting to the default which is ON where it maintains the cache for the values.

Conclusion

As explained above, the introduction of the new option in scoped configuration for controlling the identity cache behavior at the database level is very helpful in avoiding the identity value gaps issue in SQL 2017

References


Saturday, August 5, 2017

SSIS Tips: Handling UTF 8 Based File Data

Introduction

Quite a few times there were cases where we had to transfer CSV data containing non English based characters to SQLServer database. Though it seems straightforward there are few things we need to consider while working with CSV containing non English characters. This post explains the steps that shall be followed while transferring UTF 8 based data from a CSV onto a SQLServer based relational database system

Scenario

Consider the case where we've a CSV file with the below data


As you see from the above the file has a column which stores non English based data. The requirement was to get this data transferred to a SQLServer database for doing further manipulations with them.
Given this requirement our straightforward way would be to use a data flow task with a flat file source and an OLEDB destination to connect to the SQLServer db.
Lets see what happens when we try to use the standard DFT task for this case
Given below is the screenshot of the standard data flow task using non-unicode datatype I tried to use


As you see from the above it complains of data conversion issue between unicode and non-unicode data types. 
Now lets try using an unicode datatype and see what happens


Now we get an error stating that it cant convert between the two code pages. This clearly implies we cant use the default OLEDB destination in the case of UTF - 8 based file data.
Now lets try using the next possible option i.e. with ADO .Net destination. 
The data flow for this case is as below


The column will be of unicode datatype (nchar, nvarchar) in the database.
Once executed you can check the table to see that data is getting populated correctly


As seen from the above results data got transferred correctly this time to the SQLServer table.

Conclusion

By virtue of the above illustration, its clear that for transferring UTF-8 based file data to SQLServer using SSIS we need to do the below

1. Use UTF-8 based codepage (65001) in the Flat File connection manager
2. Use ADO .Net destination task inside data flow task
3. Use a Unicode based datatype for the table column like nchar, nvarchar


References





Tuesday, June 20, 2017

SQL 2017 Tips: Avoiding Gaps in IDENTITY, SEQUENCE Based Column Values

Scenario


One of the common conversations I've come across in a typical work day is as below

Emp X: I just found that the IDENTITY column values for couple of tables have jumped to a new range
Emp Y: Can you check if there was any server restart?
Emp X: Oh yes! I see it now... There was a planned server restart at night.

The reason for this behavior is explicit.

Server restart causes identity based columns to skip values. This is due to the fact that identity columns pre-allocates (caches) next set of values and on server restart the cache gets flushed. Once server comes back it resumes at the next value from the maximum value in the cache which creates gaps in identity sequence. This behavior started from 2012 version onwards.

This blog discusses a new enhancement available in SQL 2017 CTP which can be used to solve the above issue

Solution

This behavior was explained to be by design and proposed solution was to use trace flag 272 which will allow the functionality to continue as per earlier version. The MS connect link which discusses this is given below


For setting this we had to go to SQLServer configuration manager and set trace flag 272 in the startup parameters as discussed here

http://sqlblog.com/blogs/kalen_delaney/archive/2014/06/17/lost-identity.aspx

In SQL 2017, we have a good news. Microsoft has included this as a new option in SCOPED CONFIGURATION options for the database. The option is called IDENTITY_CACHE and is included in the documentation below


This has two advantages over the previous trace flag based method

1. We can set it from management studio itself rather than setting it as a startup option from SQL Configuration Manager.
2. The most important advantage is that we can scope it to database level rather than setting it at the instance level. This makes it possible for us to have other databases co-existing in the same server instance with new behavior enabled i.e. with gaps.

The complete syntax for the feature is as below

ALTER DATABASE SCOPED CONFIGURATION SET IDENTITY_CACHE=OFF ;

Once you set it at a database level, the server restart doesn't cause any gaps in the IDENTITY objects in the database, which is in line with the behavior before SQL 2012 version.

This small but very useful enhancement will certainly help the developers to set IDENTITY value behavior in a database as per their requirement.