SQL Server Code,Tips and Tricks, Performance Tuning

Monday, March 15, 2021

After 20+ years in IT .. I finally discovered this useful command

Very similar to my After 20+ years in IT .. I finally discovered this... post, I discovered yet another command I should have know about

Originally I was not going to write this post, but after I found out that several other people didn't know this, I figured what the heck, why not, maybe someone will think this is cool as well

Open up a command prompt or powershell command window, , navigate to a folder, type in tree... hit enter

Here is what I see

I was watching a Pluralsight course and the person typed in the tree command.. and I was like whoaaaa.. How do I not know this? Perhaps maybe because I don't use the command window all that much? Anyway I thought that this was pretty cool

As you can see tree list all the directories and sub directories in a tree like structure. This is great to quickly see all the directories in one shot

The same command will work in Powershell

Finally, here is an image of the command window output as well as the explorer navigation pane side by side

Hopefully this will be useful to someone

Friday, May 22, 2020

PostgreSQL adds FETCH FIRST WITH TIES.. just like TOP n WITH TIES in SQL Server

PostgreSQL 13 Beta 1 was released yesterday, you can read the release notes here

https://www.postgresql.org/about/news/2040/

One thing that caught my eye was this statement in the release notes

PostgreSQL 13 brings more convenience to writing queries with features like FETCH FIRST WITH TIES, which returns any additional rows that match the last row.

This is I guess exactly like TOP WITH TIES in SQL Server. I believe this has been around in SQL Server since at least version 7. How many times have I used it in code that was deployed in the last 20 years? I believe I have used WITH TIES only once. It does make for great interview questions and SQL puzzles :-)

So let's take a quick look at how TOP WITH TIES works in SQL Server. The first thing we will do is look at what Books On Line says about TOP

WITH TIES Returns two or more rows that tie for last place in the limited results set. You must use this argument with the ORDER BY clause. WITH TIES might cause more rows to be returned than the value specified in expression. For example, if expression is set to 5 but two additional rows match the values of the ORDER BY columns in row 5, the result set will contain seven rows.You can specify the TOP clause with the WITH TIES argument only in SELECT statements, and only if you've also specified the ORDER BY clause. The returned order of tying records is arbitrary. ORDER BY doesn't affect this rule.

Time to get started and write some code to see this in action

First create this table of students and insert some data

CREATE TABLE #TopExample(GradeAverage int, Student varchar(100))
INSERT #TopExample VALUES(99.00,'Plato'),
      (98,'Socrates'),
      (95,'Diogenes the Cynic'),
      (94,'Antisthenes'),
      (94,'Demetrius'),
      (50,'Denis')

As you can see, I am not a very good student :-(

If you do a regular TOP 4 query like this

SELECT TOP 4 GradeAverage, Student 
FROM #TopExample  
ORDER BY GradeAverage DESC

You will get back these results

GradeAverage Student
99          Plato
98          Socrates
95          Diogenes the Cynic
94          Demetrius

As you can see we are missing another student with a grade of 94, this is Antisthenes

This is easily fixed by adding WITH TIES to the query

SELECT TOP 4 WITH TIES GradeAverage, Student 
FROM #TopExample 
ORDER BY GradeAverage DESC

Now, you will get back these results, as you can see, you now have 5 rows and both rows with a grade average of 94 are included

GradeAverage Student
99          Plato
98          Socrates
95          Diogenes the Cynic
94          Demetrius
94          Antisthenes

Another way to do the same as WITH TIES is by using DENSE_RANK. That query looks like this

;WITH c AS (SELECT DENSE_RANK() OVER (ORDER BY GradeAverage DESC) AS dens, 
 GradeAverage,Student 
 FROM #TopExample)

SELECT GradeAverage, Student 
FROM c WHERE dens <=4
ORDER BY GradeAverage DESC

You will get back these same results again, you now have 5 rows and both rows with a grade average of 94 are included as well

GradeAverage Student
99          Plato
98          Socrates
95          Diogenes the Cynic
94          Demetrius
94          Antisthenes

Using DENSE_RANK is bit more code, but if portability is a concern, it might be a better choice

There you go a post about a feature you will never use :-)

If you want to run all the queries in one shot here is all the code

CREATE TABLE #TopExample(GradeAverage int, Student varchar(100))
INSERT #TopExample VALUES(99.00,'Plato'),
      (98.00,'Socrates'),
      (95.00,'Diogenes the Cynic'),
      (94.00,'Antisthenes'),
      (94.00,'Demetrius'),
      (50.00,'Denis')

SELECT TOP 4 GradeAverage, Student 
FROM #TopExample  
ORDER BY GradeAverage DESC

SELECT TOP 4 WITH TIES GradeAverage, Student 
FROM #TopExample 
ORDER BY GradeAverage DESC

;WITH c AS (SELECT DENSE_RANK() OVER (ORDER BY GradeAverage DESC) AS dens, 
 GradeAverage,Student 
 FROM #TopExample)

SELECT GradeAverage, Student 
FROM c WHERE dens <=4
ORDER BY GradeAverage DESC 


DROP TABLE #TopExample

And here is what it all looks like in SSMS, code and output

Wednesday, May 06, 2020

You know about waitfor delay but did you know there is a waitfor time?

I was looking at some code I wrote the other day and noticed the WAITFOR command.. This got me thinking. How many times have I used WAITFOR in code, probably as much as I have used NTILE :-)

I looked at the documentation for WAITFOR and notice there is TIME in addition to DELAY. Oh that is handy, I always rolled my own ghetto-style version by calculating how long it would be until a specific time and then I would use that in the WAITFOR DELAY command

Why would you use the WAITFOR command?

The WAITFOR command can be used to delay the execution of command for a specific duration or until a specific time occurs. From Books On Line, the description is as follows:

Blocks the execution of a batch, stored procedure, or transaction until either a specified time or time interval elapses, or a specified statement modifies or returns at least one row.

WAITFOR
{
DELAY 'time_to_pass'
| TIME 'time_to_execute'
| [ ( receive_statement ) | ( get_conversation_group_statement ) ]
[ , TIMEOUT timeout ]
}

Arguments
DELAY
Is the specified period of time that must pass, up to a maximum of 24 hours, before execution of a batch, stored procedure, or transaction proceeds.

'time_to_pass'
Is the period of time to wait. time_to_pass can be specified either in a datetime data format, or as a local variable. Dates can't be specified, so the date part of the datetime value isn't allowed. time_to_pass is formatted as hh:mm[[:ss].mss].

TIME
Is the specified time when the batch, stored procedure, or transaction runs.

'time_to_execute'
Is the time at which the WAITFOR statement finishes. time_to_execute can be specified in a datetime data format, or it can be specified as a local variable. Dates can't be specified, so the date part of the datetime value isn't allowed. time_to_execute is formatted as hh:mm[[:ss].mss] and can optionally include the date of 1900-01-01.

WAITFOR with a receive_statement or get_conversation_group_statement is applicable only to Service Broker messages. I will not cover those in this post

I must admit that I only use these commands a couple of times a year when running something ad-hoc. In code, I will use WAITFOR DELAY when doing a back fill of data, and the table is replicated. In that case I will batch the data and after each batch is completed I will pause for a second or so. The reason I am doing this is because I don't want to increase replication latency, after all, I am a nice guy

WAITFOR TIME

Let's take a look how you would use the WAITFOR command. I will start with WAITFOR TIME

The command is very easy.. if you want the print command to run at 09:57:16, you would do the following

WAITFOR TIME '09:57:16'
PRINT 'DONE  '

The seconds are optional, if you want it to run at 9 hours and 57 minutes, you can do the following

WAITFOR TIME '09:57'
PRINT 'DONE  '

One thing to know is that you can't grab the output from a time data type and use that in your WAITFOR TIME command. The following will blow up

SELECT CONVERT(time,  getdate()) --'09:57:16.9600000'

WAITFOR TIME '09:57:16.9600000'

Msg 148, Level 15 , State 1, Line 32
Incorrect time syntax in time string '09:57:16.9600000' used with WAITFOR.

What you need to do is strip everything after the dot.

We need the command to be the following

WAITFOR TIME '09:57:16'

There are two ways to accomplish this... first way is by using PARSENAME, I blogged about that function several times, the first time here: Ten SQL Server Functions That You Hardly Use But Should

All you have to tell SQL Server which part you want, if you use PARSENAME,1 you will get everything after the dot, if you use PARSENAME,2 you will get everything before the dot.

1 2	SELECT PARSENAME('09:57:16.9600000',2), PARSENAME('09:57:16.9600000',1)

This returns the following

09:57:16 9600000

The easiest way would have been to just use time(0) instead


1
2
SELECT CONVERT(time,  getdate()) ,--'09:57:16.9600000'
CONVERT(time(0),  getdate())  --'09:57:16

Below is a complete example that will wait for 10 seconds to run the PRINT statement on line 12 if you run the whole code block in 1 shot.

Also notice that I use a variable with the WAITFOR TIME command on line 9. The caveat with that is that the variable can't be a time datatype. This is why I use a varchar datatype and store the value of the time data type in it. The reason I use the time datatype in my procs is so that I don't have to do a lot of validations when someone is calling the proc. If they pass in a string that can't be converted.. the proc won't even run... it will fail right at the proc call itself

DECLARE @DelayTime time(0)= GETDATE()
PRINT @DelayTime

SELECT @DelayTime =DATEADD(second,10,@DelayTime)

PRINT @DelayTime

DECLARE @d varchar(100) = @DelayTime -- you have to use varchar in command
WAITFOR TIME @d 

-- Run your command here
PRINT 'DONE  ' + CONVERT(varchar(100),CONVERT(time(0),  getdate()))

What is printed is the following

10:49:48
10:49:58
DONE 10:49:58

Now when would you really use WAITFOR TIME? You can accomplish the same with a scheduled job, the only time I use WAITFOR TIME is if I want a quick count of want to run something at a specific time but I know I won't be at my desk and I can't create a job without a ticket

But you also have to be aware that if your connection gets lost to the SQL Server instance, your command won't execute

WAITFOR DELAY

The WAITFOR DELAY command is similar to the WAITFOR TIME command, instead of waiting for a time, the command pauses for a specific period

Like I said before, I use WAITFOR DELAY as well as a batch size in my back fill procs. Both can be passed in, if you do a load during a weekday, your delay would be longer than on a weekend.

Sometimes I need to see how many rows are getting inserted every minute.. or something similar
I will then combine WAITFOR DELAY and the batch terminator with a count number to execute the batch of statements more than once

Here is such an example, it will run the INSERT statement 20 times, it will pause 1 minute between each execution

INSERT #temp(SomeCol, SomeTimeStamp)
SELECT COUNT(*), GETDATE() FROM sometable 
WAITFOR DELAY '00:01:00'
GO 20

That's all for this post.

Do you use the WAITFOR command, if so, what do you use it for?

Monday, January 20, 2020

TVPs vs Memory Optimized TVPs

The other day I was thinking about the blog post Faster temp table and table variable by using memory optimization I read a while back. Since you can't believe anything on the internets (no disrespect to whoever wrote that post) , I decided to take this for a test

In this post I will be creating 2 databases, one is a plain vanilla database and the other, a database that also has a file group that contains memory optimized data

I will also be creating a table type in each database, a plain one and a memory optimized one in the memory optimized database

So lets get started, first I will create the regular database and the regular table type

CREATE DATABASE TempTVP
GO

USE TempTVP
GO

CREATE TYPE dbo.DataProcessingType AS TABLE(
 SomeDate datetime NOT NULL,
 SomeSymbol varchar(40) NOT NULL,
 SomeValue numeric(24, 10) NOT NULL,
 SomeDescription varchar(100),
 index tvp_temp (SomeDate, SomeSymbol))
GO

Now I will create the memory optimized database and the memory optimized table type
In order for the database to be able to use memory optimized code, you need to add a filegroup and tell SQL Server it contains memory optimized data, after that is created, you add a file to that file group.

The table type syntax is identical except for the line (WITH (MEMORY_OPTIMIZED = ON);) at the end

Here is what the script looks like

CREATE DATABASE TempTVPHekaton
GO

USE TempTVPHekaton
GO


ALTER DATABASE [TempTVPHekaton] ADD FILEGROUP [Hekaton_Data] 
CONTAINS MEMORY_OPTIMIZED_DATA 
GO


ALTER DATABASE [TempTVPHekaton] ADD FILE (NAME='Hekaton_Data_file',
 FILENAME='C:\Data\ekaton_Data_file.mdf') TO FILEGROUP Hekaton_Data;
GO

CREATE TYPE dbo.DataProcessingType AS TABLE(
 SomeDate datetime NOT NULL,
 SomeSymbol varchar(40) NOT NULL,
 SomeValue numeric(24, 10) NOT NULL,
 SomeDescription varchar(100),
 index tvp_temp (SomeDate, SomeSymbol))
  WITH   (MEMORY_OPTIMIZED = ON); 
GO

Now that we have our two database, lets create a very simple stored proc in each database, all it does is store the row count from the table valued parameter passed in into a variable

CREATE PROCEDURE prTestTVP @tvp DataProcessingType readonly

AS

SET NOCOUNT ON

DECLARE @Count int

SELECT @Count = COUNT(*) FROM @tvp
GO

Now it is time to generate the test script

The text script will call the stored procedure 1000 times passing in a table valued parameter
The test script will populate the table type with 1000 rows, the data looks like this

That data is pushed into the table valued parameter, the proc is called, the table type is cleared out and every 100 iterations the current iteration will be printed

SET NOCOUNT ON
DECLARE @LoopID int = 1

WHILE @LoopID <= 1000
BEGIN
 DECLARE @tvp DataProcessingType
 INSERT @tvp -- add some values
  SELECT DATEADD(d,number,'20200101') as SomeDate,
  'X' + STR(number) + STR(@LoopID) as SomeSymbol,
   number * @LoopID * 1.11 as SomeValue,
   LEFT(REPLICATE(number,100),100) as SomeDescription
 FROM master..spt_values
  WHERE type = 'p' -- only numbers
  and number < 1000
 ORDER BY NEWID() --pseudo-random sort


 EXEC prTestTVP @tvp -- CALL proc with 1000 rows
 
 DELETE @tvp -- delete the data since it will be populated again

  if @LoopID %100 = 0 -- print every 100 iterations
  PRINT STR(@LoopID)
 SET @LoopID += 1 -- add 1 to counter

END

What I did now is take the code, I then pasted the code in 2 different SSMS windows and connected to the TempTVP database, I then executed the code in both windows and let it run. Once it was finished, I noted down how long it took and then changed the connections to the database TempTVPHekaton which is memory optimized and ran the code in both windows as well. I played around with loops of 100, 1000, 2000, I played around as well by populating the table with rows between 1000 and 2048

Here are some of the results

DB Storage	Iterations * rows	Percentage of time
Disk	1000 * 1000	85.37%
Memory	1000 * 1000	14.63%

Disk	1500 * 1000	76.36%
Memory	1500 * 1000	23.64%

Disk	5000 * 100	92.31%
Memory	5000 * 100	7.69%

So it looks like it is at least 4 times faster, if the table is smaller and you have more iterations, it gets even faster

I did run into an issue while testing, if I made it execute 5000 times with a 2000 rows table.. I was greeted by the following error

Msg 701, Level 17, State 154, Procedure prTestTVP, Line 7 [Batch Start Line 0]
There is insufficient system memory in resource pool 'default' to run this query.

This code was running on a laptop where I had 40 tabs open in chrome so there was not a lot of free memory, I also didn't create a resource pool, everything was a default setup

If you look at the code you will see that I clear out the table after each iteration.

However the table variable doesn't get out of scope until the loop is finished. In my real time scenario, I don't have this issue, my procs are called by many processes but not in a loop

To read more about this error start here

Be aware of 701 error if you use memory optimized table variable in a loop

This is actually by-design behavior documented in “Memory-Optimized Table Variables”). Here is what is state “Unlike memory-optimized tables, the memory consumed (including deleted rows) by table variables is freed when the table variable goes out of scope)”. With a loop like above, all deleted rows will be kept and consume memory until end of the loop.

There you go.. if you are using table types, switching them to in memory table types might help your application perform better. But of course as I said before... since you can't believe anything on the internets, test for yourself

Monday, December 30, 2019

Wednesday, October 30, 2019

SQLSTATE 4200 Error 666 and what to do.

This morning I was greeted by the following message from a job email

The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 72059165481762816. Dropping and re-creating the index may resolve this; otherwise, use another clustering key. [SQLSTATE 42000] (Error 666)

Almost Halloween? check!
Error 666? check!
Ever seen this error before? no!

The job has a step that inserts into a bunch of tables...
The table in question had a clustered index that was created without the UNIQUE property. When you create such an index, SQL Server will create a uniqueifier internally

This part is from the CSS SQL Server Engineers blog post

A uniqueifier (or uniquifier as reported by SQL Server internal tools) has been used in the engine for a long time (since SQL Server 7.0), and even being known to many, referenced in books and blogs, The SQL Server documentation clearly states that you will not see it exposed externally in the engine (https://docs.microsoft.com/en-us/sql/relational-databases/sql-server-index-design-guide).

"If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users."

While it´s unlikely that you will face an issue related with uniqueifiers, the SQL Server team has seen rare cases where customer reaches the uniqueifier limit of 2,147,483,648, generating error 666.

Msg 666, Level 16, State 2, Line 1

The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID <PARTITIONID>. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.

So I ran into this rare case :-(

How can you quickly find out what table and index name the error is complaining about?

You can use the following query, just change the partitionid to match the one from your error message

SELECT SCHEMA_NAME(o.schema_id) as SchemaName, 
  o.name as ObjectName, 
  i.name as IndexName, 
  p.partition_id as PartitionID
FROM sys.partitions p
JOIN sys.objects o on p.object_id = o.object_id
JOIN sys.indexes i on p.object_id = i.object_id
WHERE p.partition_id = 72059165481762816

After running the query, you will now have the schema name, the table name and the index name. That is all you need to find the index, you can now drop and recreate it

In my case this table was not big at all... 5 million rows or so, but we do delete and insert a lot of data into this table many times a day.
Also we have rebuild jobs running, rebuild jobs do not reset the uniqifier (see also below about a change from the CSS SQL Server Engineers)

To fix this, all I had to do was drop the index and recreate the index (after filling out tickets and testing it on a lower environment first).

DROP INDEX [IX_IndexName] ON [SchemaName].TableName] 
GO

CREATE CLUSTERED INDEX [IX_IndexName] ON [SchemaName].[TableName] 
(
 Col1 ASC,
 Col2 ASC,
 Col3 ASC
) ON [PRIMARY]
GO

After dropping and recreating the index.. the code that threw an error earlier did not throw an error anymore

Since my table only had 5 million rows or so.. this was not a big deal and completed in seconds. If you have a large table you might have to wait or think of a different approach

If you want to know more, check out this post by the CSS SQL Server Engineers Uniqueifier considerations and error 666

The interesting part is

As of February 2018, the design goal for the storage engine is to not reset uniqueifiers during REBUILDs. As such, rebuild of the index ideally would not reset uniquifiers and issue would continue to occur, while inserting new data with a key value for which the uniquifiers were exhausted. But current engine behavior is different for one specific case, if you use the statement ALTER INDEX ALL ON <TABLE> REBUILD WITH (ONLINE = ON), it will reset the uniqueifiers (across all version starting SQL Server 2005 to SQL Server 2017).

Important: This is something that is not documented and can change in future versions, so our recommendation is that you should review table design to avoid relying on it.

Edit.. it turns out I have seen this before and have even blogged about it http://sqlservercode.blogspot.com/2017/06/having-fun-with-maxed-out-uniqifiers-on.html

Tuesday, June 11, 2019

Can adding an index make a non SARGable query SARGable?

This question came up the other day from a co-worker, he said he couldn't change a query but was there a way of making the same query produce a better plan by doing something else perhaps (magic?)

He said his query had a WHERE clause that looked like the following

WHERE RIGHT(SomeColumn,3) = '333'

I then asked if he could change the table, his answer was that he couldn't mess around with the current columns but he could add a column

Ok, that got me thinking about a solution, let's see what I came up with

First create the following table

USE tempdb
GO


CREATE TABLE StagingData (SomeColumn varchar(255) NOT NULL )

ALTER TABLE dbo.StagingData ADD CONSTRAINT
 PK_StagingData PRIMARY KEY CLUSTERED 
 (
 SomeColumn
 )  ON [PRIMARY]

GO

We will create some fake data by appending a dot and a number between 100 and 999 to a GUID

Let's insert one row so that you can see what the data will look like

DECLARE @guid uniqueidentifier
SELECT @guid = 'DEADBEEF-DEAD-BEEF-DEAD-BEEF00000075' 

INSERT StagingData
SELECT CONVERT(varchar(200),@guid) + '.100'

SELECT * FROM StagingData

Output

SomeColumn
--------------------------------
DEADBEEF-DEAD-BEEF-DEAD-BEEF00000075.100

Time to insert 999,999 rows

Here is what the code looks like

INSERT StagingData
SELECT top 999999 CONVERT(varchar(200),NEWID()) 
 +  '.' 
 + CONVERT(VARCHAR(10),s2.number)
FROM master..SPT_VALUES s1
CROSS JOIN master..SPT_VALUES s2
WHERE s1.type = 'P'
AND s2.type = 'P'
and s1.number between 100 and 999
and s2.number between 100 and 999

With that completed we should now have one million rows

If we run our query to look for rows where the last 3 characters are 333 we can see that we get a scan

SET STATISTICS IO ON
GO

SELECT SomeColumn FROM StagingData
WHERE RIGHT(SomeColumn,3) = '333'


SET STATISTICS IO OFF
GO

(900 rows affected)
Table 'StagingData'. Scan count 1, logical reads 5404, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

We get 900 rows back and 5404 reads

Here is what the execution plan looks like

If we always query for the last 3 characters, what we can do is add a computed column to the table that just contains the last 3 characters and then add a nonclustered index to that column

That code looks like this

ALTER TABLE StagingData ADD RightChar as RIGHT(SomeColumn,3)
GO


CREATE INDEX ix_RightChar on StagingData(RightChar)
GO

Now let's check what we get when we use this new column

SET STATISTICS IO ON
GO

SELECT SomeColumn  FROM StagingData
WHERE RightChar  = '333'


SET STATISTICS IO OFF
GO

(900 rows affected)
Table 'StagingData'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The reads went from 5404 to 10, that is a massive improvement, here is what the execution plan looks like

However there is a small problem.....

We said we would not modify the query...

What happens if we execute the same query from before? Can the SQL Server optimizer recognize that our new column and index is pretty much the same as the WHERE clause?

SET STATISTICS IO ON
GO

SELECT SomeColumn FROM StagingData
WHERE RIGHT(SomeColumn,3) = '333'


SET STATISTICS IO OFF
GO


(900 rows affected)

Table 'StagingData'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.



Damn right, the optimizer can, , there it is, it uses the new index and column although we specify the original column..... (must be all that AI built in... (just kidding))



If you look at the execution plan, you can see it is indeed a seek

So there you have it.. sometimes, you can't change the query, you can't mess around with existing column but you can add a column to the table, in this case a technique like the following can be beneficial

PS

Betteridge's law of headlines is an adage that states: "Any headline that ends in a question mark can be answered by the word no." It is named after Ian Betteridge, a British technology journalist who wrote about it in 2009

In this case as you can plainly see...this is not true :-) The answer to "Can adding an index make a non SARGable query SARGable?" is clearly yes

Wednesday, April 24, 2019

How to count NULLS without using IS NULL in a WHERE clause

This came up the other day, someone wanted to know the percentage of NULL values in a column

Then I said "I bet you I can run that query without using a NULL in the WHERE clause, as a matter of fact, I can run that query without a WHERE clause at all!!"

The person then wanted to know more, so you know what that means.. it becomes a blog post :-)

BTW, the PostgreSQL version of this blog post can be found here: A quick and easy way to count the percentage of nulls without a where clause in PostgreSQL

To get started, first create this table and verify you have 9 rows

CREATE TABLE foo(bar int)
INSERT foo values(1),(null),(2),(3),(4),
 (null),(5),(6),(7)

SELECT * FROM foo

Here is what the output should be

bar
1
NULL
2
3
4
NULL
5
6
7

To get the NULL values and NON NULL values, you can do something like this

SELECT COUNT(*) as CountAll FROM foo WHERE bar IS NOT NULL
SELECT COUNT(*) as CountAll FROM foo WHERE bar IS  NULL

However, there is another way

Did you know that COUNT behaves differently if you use a column name compared to when you use *

Take a look

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn
FROM foo

If you ran that query, the result is the following

CountAll CountColumn
----------- -----------
9 7

Warning: Null value is eliminated by an aggregate or other SET operation.

And did you notice the warning? That came from the count against the column

Let's see what Books On Line has to say

COUNT(*) returns the number of items in a group. This includes NULL values and duplicates.

COUNT(ALL expression) evaluates expression for each row in a group, and returns the number of nonnull values.

COUNT(DISTINCT expression) evaluates expression for each row in a group, and returns the number of unique, nonnull values.

This is indeed documented behavior

So now, lets change our query to return the percentage of non null values in the column

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn, 
  (COUNT(bar)*1.0/COUNT(*))*100 as PercentageOfNonNullValues 
FROM foo

Here is the output

CountAll CountColumn percentageOfNonNullValues
----------- ----------- ---------------------------------------
9 7 77.777777777700

I just want to point out one thing, the reason I have this * 1.0 in the query

(COUNT(bar)*1.0/COUNT(*))*100

I am doing * 1.0 here because count returns an integer, so you will end up with integer math and the PercentageOfNonNullValues would be 0 instead of 77.7777...

That's it for this short post.. hopefully you knew this, if not, then you know it now :-)

Sunday, April 14, 2019

How to check if an Extended Event session exists before dropping it

Are you still running profiler or have you transferred to using Extended Events? I use Extended Events almost exclusively now because it's so much easier compared to using profiler or trace from T-SQL. Not to mentioned you can capture more things

The other day someone checked in some code and every now and then the build would fail with the error

Msg 15151, Level 16, State 19, Line 51
Cannot drop the event session 'ProcsExecutions', because it does not exist or you do not have permission.

I decided to take a look at the code and saw what the problem was. I will recreate the code here and then show you what needs to be changed. This post will not go into what Extended Events are, you can look that up in the SQL Server Extended Events documentation

Start by creating the Extended Event session by executing the following T-SQL

CREATE EVENT SESSION ProcsExecutions ON SERVER 
ADD EVENT sqlserver.rpc_completed(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
 ),
ADD EVENT sqlserver.rpc_starting(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
  )

To start the Extended Event session from T-SQL, execute the following command

ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = START;

Below is what the code looked like that was checked in.

You can run it and it will execute without a problem

IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
DROP EVENT SESSION ProcsExecutions ON SERVER
GO


CREATE EVENT SESSION ProcsExecutions ON SERVER 
ADD EVENT sqlserver.rpc_completed(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
 ),
ADD EVENT sqlserver.rpc_starting(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
  )


  ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = START; 
   GO

However if you run the following command now

 ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = STOP;

And then execute the same create Extended Event T-SQL Query again

IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
DROP EVENT SESSION ProcsExecutions ON SERVER
GO


CREATE EVENT SESSION ProcsExecutions ON SERVER 
ADD EVENT sqlserver.rpc_completed(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
 ),
ADD EVENT sqlserver.rpc_starting(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
  )


 ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = START; 
   GO

You will get the error

Msg 25631, Level 16, State 1, Line 29
The event session, "ProcsExecutions", already exists. Choose a unique name for the event session.

So why is that? There are 2 DMV that exist sys.dm_xe_sessions and sys.server_event_sessions. The DMV sys.dm_xe_sessions only returns a row for Extended Event sessions that are in the running state, the DMV sys.server_event_sessions will return a row even if the Extended Event session is not currently running

Lets' take a look at what that looks like by running some queries and commands

First we are going to stop the session and then query the sys.dm_xe_sessions DMV

--Stop the session if is running
 ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = STOP; 
   GO

-- this query returns only the running Extended Event sessions
SELECT dxs.name,
dxs.create_time,*
FROM sys.dm_xe_sessions AS dxs;

Output
-----------------
hkenginexesession
system_health
sp_server_diagnostics session
telemetry_xevents

As you can see our Extended Event session is not returned because it is not in a running state

Now lets's query the sys.server_event_sessions DMV and check if our Extended Event session is returned

-- this query returns also Extended Event sessions that are not currently running
 SELECT *
 FROM sys.server_event_sessions

Output
-----------------
system_health
AlwaysOn_health
telemetry_xevents
ProcsExecutions

As you can see our Extended Event session is returned even though it is not in a running state

If we now start the session again and then check the sys.dm_xe_sessions DMV, we will get back out session

-- start the session again
   ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = START; 
   GO


SELECT dxs.name,
dxs.create_time,*
FROM sys.dm_xe_sessions AS dxs;

Output
-----------------
hkenginexesession
system_health
sp_server_diagnostics session
telemetry_xevents
ProcsExecutions

So now our Extended Event session is returned because it is in a running state

Instead of this query to check for the existence

IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
DROP EVENT SESSION ProcsExecutions ON SERVER
GO

What we really want is this

IF EXISTS (SELECT name FROM sys.server_event_sessions  WHERE name = 'ProcsExecutions')
DROP EVENT SESSION ProcsExecutions ON SERVER
GO

So basically we change the dmv from sys.dm_xe_sessions to sys.server_event_sessions in IF EXISTS check

So it is a pretty easy change, just swap out the DMV

If you want to stop a session if it is running, you can go ahead and implement something like this

 IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
 BEGIN
 PRINT 'The Session Was Running'

 ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = STOP; 
END

IF NOT EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
 PRINT 'The Session is NOT Running'

That's all for this post, hopefully it will be useful to someone

Saturday, April 13, 2019

How to improve your tech skills

Today we are going to look at how to improve your tech skills. This really is a continuation of the Stay relevant and marketable post from a couple of weeks ago. Here are some things that you can do to improve your tech skills

Attend usergroups

Attend your local usergroup meetings, there is always some expert that comes to do presentations.

Answer questions

I still think answering questions is one of the best ways to improve your skill. Join a QA site like stackoverflow, head on to a specialized site on stackexchange, here is a list of all of them http://stackexchange.com/sites

If you are not comfortable with answering yet or if you realize that the questions are too difficult, don't worry about, just start by lurking. What you will find out over time is that every month you will be able to answer more and more of these question. This is because the questions are pretty much the same but some little detail might be different. After a while you will notice that there will be very few questions that you won't be able to answer in your field of expertise

Lunch and learns

No time you say to improve your skills, do you take lunch breaks? If so consider doing lunch and learns, get into a conference room, fire up the projector and then either look at code with the team, do design, watch videos, whatever floats your boat

Get involved with an open source project

A good way to improve your skills is to get involved with an open source project. Pick a project download it, then pick it apart. Start reading through the code, notice how things are done, ask yourself why it was done that way. Would you do it the same way? If you pick a big enough project, there will be many contributors, can you tell that the code was put together or does it pretty much look like it was written by one person. Are standards followed, how many design patterns are used

Read books, read code, read blogs

There are many classic list of books that every programmer should read
Here is just a small list that you can choose from, I grabbed this from stackoverflow

Code Complete (2nd edition) by Steve McConnell
The Pragmatic Programmer
Design Patterns by the Gang of Four
Refactoring: Improving the Design of Existing Code
The Mythical Man Month
Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin
CODE by Charles Petzold
Working Effectively with Legacy Code by Michael C. Feathers
Peopleware by Demarco and Lister
Coders at Work by Peter Seibel
Patterns of Enterprise Application Architecture by Martin Fowler
Test-Driven Development: By Example by Kent Beck
Practices of an Agile Developer
Don't Make Me Think
Agile Software Development, Principles, Patterns, and Practices by Robert C. Martin
Domain Driven Designs by Eric Evans
The Design of Everyday Things by Donald Norman
JavaScript - The Good Parts
Getting Real by 37 Signals
The Annotated Turing
Agile Principles, Patterns, and Practices in C# by Robert C. Martin
The Soul of a New Machine by Tracy Kidder
Here Comes Everybody: The Power of Organizing Without Organizations by Clay Shirky
Pragmatic Unit Testing in C# with NUnit by Andy Hunt and Dave Thomas with Matt Hargett
Rework by Jason Freid and DHH
JUnit in Action

Reading code is also a good way to improve your skills, head over to the frameworks you use the most and start digging around in the API, look at the example code.
Read blogs of subject expert, study their code and techniques, if something is not clear don't hesitate to leave a comment asking for some info or further explanation

Practice by doing katas

If you have ever done Karate you will know what a kata is, it is basically the practice of forms. A kata, or code kata, is defined as an exercise in programming which helps hone your skills through practice and repetition. Dave Thomas, started this movement for programming. You can find a list of awesome katas here: https://github.com/gamontal/awesome-katas

Blog

I found that blogging has been very good for my tech skills. It keeps me sharp and since I blog about new things it keeps my skill set up to date. When blogging, your readers will tell you when the code is wrong, so you have to make sure everything is tested and will run as shown in the post. Since you will have to do some research when writing these blog posts, your skills will improve and expand.
An added bonus is that I have a code library that I can access anytime I want.

Write a book

If you are a masochistic type of person then I recommend you write a book, almost everybody in the tech world that I know swore that they would never write a book again when they were done.......and yet they did. In order to write a book you have to spend a LOT of time doing research, making sure your code is correct and much more. Once you are done with this if you were not a subject expert you will be now. The worst part of writing a book is the initial feedback you get pointing out all your mistakes, if you are not thick skinned this could become a problem.

Listen to podcast, watch webinars

I get a lot of my tech info from podcasts, I like it better than listening to music at times and it makes the commute or run more enjoyable. The benefit is that you will learn something, you also might hear about some new shiny thing and then you will want to check it out when you get to the computer. There are many things I have learned from podcast, I also look forward to the next episode