Sunday, October 15, 2017

Triggers, when to use them, when not to use them

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.


 Today we are going to look at triggers. Triggers are a great way to keep your database in a consistent state. There are two types of triggers, DML triggers and DLL triggers. DML triggers respond to Data Manipulation Statements (Insert, Delete, Update) DDL triggers respond to Data Definition Language events.


Some things that DML triggers are used for:
  • Keeps your databases from having wrong data by doing checks that can't be handled with constraints
  • Filling in values that are not supplied and can't be handled through default constraints since these don't fire on updates
  • Calculation summary values and updates the summary table with that value
  • Used as a mechanism to maintain an audit trail for DML statements
Some things that DDL triggers are used for:
  • Automatically add columns to a table if they were not added, for example LastUpdated and InsertedBy columns
  • Notify a DBA when a database has been created, dropped or altered
  • Used as a mechanism to maintain an audit trail for DDL statements, capture every time an object has been created, dropped or altered and by who
Most common mistake people make when first starting writing triggers is that they write it in such a way that it will only work if you insert/update/delete one row at a time. A trigger fires per batch not per row, you have to take this into consideration otherwise your DML statements will blow up. How to do this is explained in this post Coding SQL Server triggers for multi-row operations, there is no point recreating that post here.

Another problem that I see is that some people think a trigger is SQL Server's version of crontab, you will see code that sends email, kicks off jobs, runs stored procedures. This is the wrong approach, a trigger should be lean and mean, it should execute as fast as possible, if you need to do some additional things then dump some data from the trigger into a processing table and then use that table to do your additional tasks. Don't use triggers as a messaging system either, SQL Server comes with Service Broker, use that instead. 

Triggers might look like hammers to some people but I guarantee you not everything is a nail....

You could end up with a real difficult thing to debug, one trigger that kicks off other triggers, now have fun debugging the trigger hell you got yourself into....or worse debug this mess if you inherited this....this is like the GOTO spaghetti code of databases.
Since triggers work besides the scenes you might spend hours debugging something only to find out that a trigger modified the value

One thing I always find interesting is when someone sees two n rows affected statements when they only did one insert, you know a person like that has not been exposed to triggers yet

Some people will say that you don't need triggers for anything and that they do more harm than good, I myself don't agree with that, triggers have a place but they should not be abused and overused, the same can be said of views


Saturday, October 14, 2017

Coding SQL Server triggers for multi-row operations

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.

Today I decided to revisit the post about coding triggers for multi-row operations

There are many forum posts and questions on stackoverflow where people have trigger code. However  these triggers are coded incorrectly because they don't account for multi-row operations. 

The one thing you have to remember is that a trigger fires per batch not per row, if you are lucky you will get an error...if you are not lucky you will not get an error but it might take a while before you notice that you are missing a whole bunch of data

Let's take a look at exactly what happens, first create these two tables



CREATE TABLE Test(id int identity not null primary key, 
   SomeDate datetime not null)
GO

CREATE TABLE  TestHistory(id int  not null, 
   InsertedDate datetime not null)
GO


Now create the following trigger.


CREATE  TRIGGER trTest
    ON Test
    FOR INSERT
    AS
     
    IF @@ROWCOUNT =0
    RETURN
     
    DECLARE @id int
    SET @id = (SELECT id 
    FROM inserted)
    
    INSERT TestHistory (id,InsertedDate)
    SELECT @id, getdate()
    
    GO

The trigger you just created is very simple, it basically inserts a row into the history table every time an insert happens in the test table


Run this insert statement which only inserts one row


INSERT Test(SomeDate) values(getdate())

Now run this to see what is in the history table


SELECT * FROM TestHistory


1 2017-10-14 08:49:16.227


That all works fine, what happens when we try to insert 2 rows?



INSERT Test(SomeDate)
SELECT getdate()
UNION ALL
SELECT  dateadd(dd,1,getdate() )


Here is the error.

Server: Msg 512, Level 16, State 1, Procedure trTest, Line 11
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The statement has been terminated.


As you can see the trigger blew up with an error. Let's try something else.
What would happen if you coded the trigger in this way


ALTER TRIGGER trTest
    ON Test
    FOR INSERT
    AS
     
    IF @@ROWCOUNT =0
    RETURN
     
    DECLARE @id int
    SELECT @id = id 
    FROM inserted
    
    INSERT TestHistory (id,InsertedDate)
    SELECT @id, getdate()
    
    GO


Now insert one row


INSERT Test(SomeDate) VALUES (getdate())

We look again what is in the history table, as you can see we have id 1 and 4, this is because id 2 and 3 failed and were rolled back when we did the insert earlier



SELECT * FROM TestHistory

1 2017-10-14 08:49:16.227
4 2017-10-14 08:50:37.647


Here is where it gets interesting, run this code


INSERT Test(SomeDate)
SELECT getdate()
UNION ALL
SELECT dateadd(dd,1,getdate() )


That runs fine but when we look now we are missing one of the rows in the history table


SELECT * FROM TestHistory

1 2017-10-14 08:49:16.227
4 2017-10-14 08:50:37.647
5 2017-10-14 08:51:06.270


let's try that same insert statement again


INSERT Test(SomeDate)
SELECT getdate()
UNION ALL
SELECT dateadd(dd,1,getdate() )

Now we are again missing a row in the history table


SELECT * FROM TestHistory

1 2017-10-14 08:49:16.227
4 2017-10-14 08:50:37.647
5 2017-10-14 08:51:06.270
7 2017-10-14 08:52:09.447


The problem is with this line of code


SELECT @id = id FROM inserted


@id will only hold the value for one of the rows that was returned in the result set


Here is how you would change the trigger to work correctly



ALTER TRIGGER trTest
    ON Test
    FOR INSERT
    AS
     
    IF @@ROWCOUNT =0
    RETURN
     
        
    INSERT TestHistory (id,InsertedDate)
    SELECT id, getdate()
    FROM inserted
    
GO


Now run the single insert statement again


INSERT Test(SomeDate) VALUES (getdate())


That row was inserted, we can check the history table to see what is there now



SELECT * FROM TestHistory

1 2017-10-14 08:49:16.227
4 2017-10-14 08:50:37.647
5 2017-10-14 08:51:06.270
7 2017-10-14 08:52:09.447
9 2017-10-14 08:52:57.990


Finally, we can again test with the insert statement that will insert 2 rows


INSERT Test(SomeDate)
SELECT getdate()
UNION ALL
SELECT dateadd(dd,1,getdate() )


Let's check the history table again


SELECT  * FROM TestHistory

1 2017-10-14 08:49:16.227
4 2017-10-14 08:50:37.647
5 2017-10-14 08:51:06.270
7 2017-10-14 08:52:09.447
9 2017-10-14 08:52:57.990
11 2017-10-14 08:53:40.693
10 2017-10-14 08:53:40.693

And as you can see both rows were inserted into the history table

So what is worse in this case? The error message or the fact that the code didn't blow up but that the insert wasn't working correctly? I'll take an error message any time over the other problem.

Wednesday, October 11, 2017

Sargable Queries...

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.

Today we are going to look at sargable queries. You might ask yourself, what is this weird term sargable. Sargable comes from searchable argument, sometimes also referred as Search ARGument ABLE. What that means is that the query will be able to use an index, a seek will be performed instead of a scan. In general any time you have a function wrapped around a column, an index won't be used


Some examples that are not sargable
WHERE LEFT(Name,1) = 'S'
WHERE Year(SomeDate) = 2012
WHERE OrderID * 3 = 33000


Those three should be rewritten like this in order to become sargable

WHERE Name LIKE 'S%'
WHERE SomeDate >= '20120101' AND SomeDate < '20130101'
WHERE OrderID = 33000/3


Let's create a table, insert some data so that we can look at the execution plan
Create this simple table

CREATE TABLE Test(SomeID varchar(100))


Let's insert some data that will start with a letter followed by some digits

INSERT Test
SELECT LEFT(v2.type,1) +RIGHT('0000' + CONVERT(varchar(4),v1.number),4) 
FROM master..spt_values v1
CROSS JOIN (SELECT DISTINCT LEFT(type,1) AS type 
FROM master..spt_values) v2
WHERE v1.type = 'p'


That insert should have generated 32768 rows


Now create this index on that table

CREATE CLUSTERED INDEX cx_test ON Test(SomeID)

Let's take a look at the execution plan, hit CTRL + M, this will add the execution plan once the query is done running

SELECT * FROM Test
WHERE SomeID LIKE 's%'

SELECT * FROM Test
WHERE LEFT(SomeID,1) = 's'

Here is what the plans looks like


As you can see it is 9% versus 91% between the two queries, that is a big difference
Hit CTRL + M again to disable the inclusion of the plan

Run this codeblock, it will give you the plans in a text format

SET SHOWPLAN_TEXT ON
GO

SELECT * FROM Test
WHERE SomeID LIKE 's%'

SELECT * FROM Test
WHERE LEFT(SomeID,1) = 's'
GO

SET SHOWPLAN_TEXT OFF
GO

Here are the two plans
|--Clustered Index Seek(OBJECT:([master].[dbo].[Test].[cx_test]),
SEEK:([master].[dbo].[Test].[SomeID] >= 'Rþ' AND [master].[dbo].[Test].[SomeID] < 'T'),
WHERE:([master].[dbo].[Test].[SomeID] like 's%') ORDERED FORWARD)
|--Clustered Index Scan(OBJECT:([master].[dbo].[Test].[cx_test]),
WHERE:(substring([master].[dbo].[Test].[SomeID],(1),(1))='s'))
As you can see the top one while looking more complicated is actually giving you a seek

Making a case sensitive search sargable

Now let's take a look at how we can make a case sensitive search sargable as well
In order to do a search and make it case sensitive, you have to have a case sensitive collation, if your table is not created with a case sensitive collation then you can supply it as part of the query
Here is an example to demonstrate what I mean


This is a simple table created without a collation

CREATE TABLE TempCase1 (Val CHAR(1))
INSERT TempCase1 VALUES('A')
INSERT TempCase1 VALUES('a')

Running this select statement will return both rows

SELECT * FROM TempCase1
WHERE Val = 'A' 

Val
-----
A
a

Now create the same kind of table but with a case sensitive collation


CREATE TABLE TempCase2 (Val CHAR(1) COLLATE SQL_Latin1_General_CP1_CS_AS)
INSERT TempCase2 VALUES('A')
INSERT TempCase2 VALUES('a')
Run the same query

SELECT * FROM TempCase2
WHERE Val = 'A' 

Val
-----
A


As you can see you only get the one row now that matches the case

To return both rows, you can supply the case insensitive collation in the query itself

SELECT * FROM TempCase1
WHERE Val = 'A' COLLATE SQL_Latin1_General_CP1_CI_AS


Val
-----
A
a


Now let's take a look at how we can make the case sensitive search sargable
First create this table and insert some data


CREATE TABLE TempCase (Val CHAR(1))
 
INSERT TempCase VALUES('A')
INSERT TempCase VALUES('B')
INSERT TempCase VALUES('C')
INSERT TempCase VALUES('D')
INSERT TempCase VALUES('E')
INSERT TempCase VALUES('F')
INSERT TempCase VALUES('G')
INSERT TempCase VALUES('H')


Now we will insert some lowercase characters

INSERT TempCase
SELECT LOWER(Val) FROM TempCase


Now we will create our real table which will have 65536 rows

CREATE TABLE CaseSensitiveSearch (Val VARCHAR(50))

We will do a couple of cross joins to generate the data for our queries

INSERT CaseSensitiveSearch
SELECT t1.val + t2.val + t3.val + t4.val
FROM TempCase t1
CROSS JOIN TempCase t2
CROSS JOIN TempCase t3
CROSS JOIN TempCase t4

Create an index on the table

CREATE INDEX IX_SearchVal ON CaseSensitiveSearch(Val)

Just like before, if we run this we will get back the exact value we passed in and also all the upper case and lower case variations

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' 

Here are the results of that query
Val
-----
AbCd
ABcd
Abcd
ABCd
aBCd
abCd
aBcd
abcd
abCD
aBcD
abcD
aBCD
ABCD
AbCD
ABcD
AbcD


If you add the case sensitive collation to the query, you will get only what matches your value

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS

Here is the result, it maches what was passed in
Val
---
ABCD


The problem with the query above is that it will cause a scan. So what can we do, how can we make it perform better? It is simple... combine the two queries

First grab all case sensitive and case insensitive values and then after that filter out the case insensitive values
Here is what that query will look like

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD'

AND Val LIKE 'ABCD' will result in a seek, but now when it also does the Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS part, it only returns the row that matches your value

If you run both queries, you can look at the plan difference (hit CTRL + M so that the plan is included)

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS



SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD'

Here is the plan


As you can see, there is a big difference between the two
Here is the plan in text as well

SET SHOWPLAN_TEXT ON
GO
 
SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS



SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD'

GO
 
SET SHOWPLAN_TEXT OFF
GO
|--Table Scan(OBJECT:([tempdb].[dbo].[CaseSensitiveSearch]),
WHERE:(CONVERT_IMPLICIT(varchar(50),[tempdb].[dbo].[CaseSensitiveSearch].[Val],0)=CONVERT(varchar(8000),[@1],0)))
|--Index Seek(OBJECT:([tempdb].[dbo].[CaseSensitiveSearch].[IX_SearchVal]), SEEK:([tempdb].[dbo].[CaseSensitiveSearch].[Val] >= 'ABCD'
AND [tempdb].[dbo].[CaseSensitiveSearch].[Val] <= 'ABCD'),
WHERE:(CONVERT_IMPLICIT(varchar(50),[tempdb].[dbo].[CaseSensitiveSearch].[Val],0)='ABCD' AND [tempdb].[dbo].[CaseSensitiveSearch].[Val] like 'ABCD') ORDERED FORWARD)

I really wish Microsoft would take the time to internally rewrite these two queries when it hits the optimizer


WHERE LEFT(Name,1) = 'S'
WHERE Year(SomeDate) = 2012



It should create these and then performance would be much better

WHERE Name LIKE 'S%'
WHERE SomeDate >= '20120101' AND SomeDate < '20130101'

I think there are probably some SQL Server consultants cursing me now for even trying to suggest this :-)

Thursday, October 05, 2017

Data types storage differences

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.


Today we are going to take a look at how data types can have an impact in queries and also the size of your database.

Char vs NChar

SQL Server has two data types to store character data[1], both of them come in fixed and variable length sizes. The char and varchar data type uses one byte of store to store one character, the nchar and nvarchar data type uses two bytes of store to store one character. The nchar and nvarchar data types are used to store unicode of data
Let's think about that for a second, what we are saying is that the char and varchar data type can store twice the number of characters in the same amount of store as the nchar and nvarchar data type. Why does this matter, space is cheap right? True, space is getting cheaper but we are also storing more and more data every year.


Now think about what happens you have everything stored as unicode data
  • What happens to your backup and restore process, will it be faster or slower, will the files be bigger if not compressed?
  • What about when transferring the results to and from your database server, are the packets able to store the same number of characters.
  • What about the amount of data on a page, what does this do to indexes and index lookups, how does it affect index maintenance?

If you don't need it, then don't use unicode data.
Some examples of what I have seen stored in nchar and nvarchar when realy you shouldn't:
Zip Code for US addresses
US addresses
Social Security Numbers (which were stored in plain text none the less)
Integer data (enforced by constraints or the app layer to make sure these were only digits)



Let's take a quick look by running some T-SQL
First create these two tables


CREATE TABLE TestChar (SomeCol char(10))
GO

CREATE TABLE TestNChar (SomeCol nchar(10))
GO

CREATE index ix_test on TestChar(SomeCol)
GO
CREATE index ix_test on TestNChar(SomeCol)
GO
Now populate each with some data


INSERT TestChar
SELECT TOP 1000000 '1234567890'
FROM sys.sysobjects c1
CROSS JOIN sys.sysobjects c2
CROSS JOIN sys.sysobjects c3
CROSS JOIN sys.sysobjects c4
GO

INSERT TestNChar
SELECT TOP 1000000 '1234567890'
FROM sys.sysobjects c1
CROSS JOIN sys.sysobjects c2
CROSS JOIN sys.sysobjects c3
CROSS JOIN sys.sysobjects c4
GO
Let's see how much space is used by each tables

EXEC sp_spaceused 'TestChar'
EXEC sp_spaceused 'TestNChar'
42768 KB
62736 KB

If you looked at the reserved column, you will see that the nchar data is using about 20 MB more than the char data


Implicit conversions

Besides the storage increase there is also a problem when querying for data that looks like varchar but is stored as unicode. Run the code below.


SET SHOWPLAN_TEXT ON
GO
DECLARE @v varchar(10) = '0123456789'

SELECT * FROM TestChar WHERE SomeCol LIKE  @v +'%'
GO

SET SHOWPLAN_TEXT OFF
GO

Here is the plan for that query
  |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1010]))
       |--Compute Scalar(DEFINE:([Expr1008]=LikeRangeStart([@v]+'%'), [Expr1009]=LikeRangeEnd([@v]+'%'), [Expr1010]=LikeRangeInfo([@v]+'%')))
       |    |--Constant Scan
       |--Index Seek(OBJECT:([Performance].[dbo].[TestChar].[ix_test]), SEEK:([Performance].[dbo].[TestChar].[SomeCol] > [Expr1008] AND [Performance].[dbo].[TestChar].[SomeCol] < [Expr1009]),  WHERE:([Performance].[dbo].[TestChar].[SomeCol] like [@v]+'%') ORDERED FORWARD)
If we look at the plan we can see that this looks pretty good
Usually people will sometimes change the datatype of a column but will not change any code that access this column. Let's now change the data type of the column to nchar


DROP INDEX TestChar.ix_test
GO

ALTER TABLE TestChar ALTER COLUMN SomeCol nchar(10)
GO

CREATE INDEX ix_test on TestChar(SomeCol)
GO
Run the query that gives you the plan again



SET SHOWPLAN_TEXT ON
GO
DECLARE @v varchar(10) = '0123456789'

SELECT * FROM TestChar WHERE SomeCol LIKE  @v +'%'
GO

SET SHOWPLAN_TEXT OFF
GO

Here is the plan
  |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1010]))
       |--Compute Scalar(DEFINE:([Expr1008]=LikeRangeStart(CONVERT_IMPLICIT(nvarchar(11),[@v]+'%',0)), [Expr1009]=LikeRangeEnd(CONVERT_IMPLICIT(nvarchar(11),[@v]+'%',0)), [Expr1010]=LikeRangeInfo(CONVERT_IMPLICIT(nvarchar(11),[@v]+'%',0))))
       |    |--Constant Scan
       |--Index Seek(OBJECT:([Performance].[dbo].[TestChar].[ix_test]), SEEK:([Performance].[dbo].[TestChar].[SomeCol] > [Expr1008] AND [Performance].[dbo].[TestChar].[SomeCol] < [Expr1009]),  WHERE:([Performance].[dbo].[TestChar].[SomeCol] like CONVERT_IMPLICIT(nvarchar(11),[@v]+'%',0)) ORDERED FORWARD)
As you can see, there is a conversion going on right now.
In order to get rid of the conversion, use the correct data types



SET SHOWPLAN_TEXT ON
GO
DECLARE @v nvarchar(10) = '0123456789'

SELECT * FROM TestChar WHERE SomeCol LIKE  @v +'%'
GO

SET SHOWPLAN_TEXT OFF
GO

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1010]))
       |--Compute Scalar(DEFINE:([Expr1008]=LikeRangeStart([@v]+N'%'), [Expr1009]=LikeRangeEnd([@v]+N'%'), [Expr1010]=LikeRangeInfo([@v]+N'%')))
       |    |--Constant Scan
       |--Index Seek(OBJECT:([Performance].[dbo].[TestChar].[ix_test]), SEEK:([Performance].[dbo].[TestChar].[SomeCol] > [Expr1008] AND [Performance].[dbo].[TestChar].[SomeCol] < [Expr1009]),  WHERE:([Performance].[dbo].[TestChar].[SomeCol] like [@v]+N'%') ORDERED FORWARD)



Implicit conversions also were an issue when ORMs first burst onto the scene. If you used NHibernate or LINQ to SQL with .NET, since strings in .NET are unicode, all text would be sent over as unicode and you would see all kinds on conversions.


Using larger datatypes when it is not needed

I see this problem mostly with the integer data types. Below is a list of the integer data types together with their storage size and range


tinyint
Storage size is 1 byte. Integer data from 0 through 255.
smallint
Storage size is 2 bytes. Integer data from -2^15 (-32,768) through 2^15 - 1 (32,767).
int
Storage size is 4 bytes. Integer data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).
bigint
Storage size is 8 bytes. Integer data from -2^63 (-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807).


Now imagine facebook with a billion users decided to use bigint as CountryID in their Country table, this key is then uses as a foreign key in the user demographics table. This is wasteful,either use a smallint since we won't go through 32 thousand countries in the foreseeable feature or use the 2 or 3 character ISO code. 

The problem is even worse if you have a compound 6 column key and it is used as a foreign key in tons of other tables...that was real fun to clean up....use a surrogate 1 column key in that case...but be sure to test....normalize till it hurts then denormalize till it works....I will cover normalization in another post...just wanted to mention it



[1] I know there is text and ntext but those are deprecated