Showing posts with label Best Practices. Show all posts
Showing posts with label Best Practices. Show all posts

Sunday, April 14, 2019

How to check if an Extended Event session exists before dropping it



Are you still running profiler or have you transferred to using Extended Events? I use Extended Events almost exclusively now because it's so much easier compared to using profiler or trace from T-SQL. Not to mentioned you can capture more things


The other day someone checked in some code and every now and then the build would fail with the error

Msg 15151, Level 16, State 19, Line 51
Cannot drop the event session 'ProcsExecutions', because it does not exist or you do not have permission.

I decided to take a look at the code and saw what the problem was. I will recreate the code here and then show you what needs to be changed.  This post will not go into what Extended Events are, you can look that up in the SQL Server  Extended Events documentation

Start by creating the Extended Event session by executing the following T-SQL

CREATE EVENT SESSION ProcsExecutions ON SERVER 
ADD EVENT sqlserver.rpc_completed(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
 ),
ADD EVENT sqlserver.rpc_starting(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
  )


To start the Extended Event session from T-SQL, execute the following command

ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = START; 


Below is what the code looked like that was checked in.

You can run it and it will execute without a problem

IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
DROP EVENT SESSION ProcsExecutions ON SERVER
GO


CREATE EVENT SESSION ProcsExecutions ON SERVER 
ADD EVENT sqlserver.rpc_completed(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
 ),
ADD EVENT sqlserver.rpc_starting(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
  )


  ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = START; 
   GO

However if you run the following command now

 ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = STOP; 

And then execute the same create Extended Event T-SQL Query again

IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
DROP EVENT SESSION ProcsExecutions ON SERVER
GO


CREATE EVENT SESSION ProcsExecutions ON SERVER 
ADD EVENT sqlserver.rpc_completed(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
 ),
ADD EVENT sqlserver.rpc_starting(
    ACTION(sqlos.task_time,sqlserver.client_app_name,
 sqlserver.client_hostname,sqlserver.database_name,sqlserver.sql_text)
    WHERE ([sqlserver].[database_name]=N'Test')
  )


 ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = START; 
   GO


You will get the error

Msg 25631, Level 16, State 1, Line 29
The event session, "ProcsExecutions", already exists.  Choose a unique name for the event session.

So why is that?  There are 2 DMV that exist sys.dm_xe_sessions and sys.server_event_sessions. The DMV sys.dm_xe_sessions only returns a row for Extended Event sessions that are in the running state, the DMV sys.server_event_sessions will return a row even if the Extended Event session is not currently running

Lets' take a look at what that looks like by running some queries and commands

First we are going to stop the session and then query the sys.dm_xe_sessions DMV

--Stop the session if is running
 ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = STOP; 
   GO

-- this query returns only the running Extended Event sessions
SELECT dxs.name,
dxs.create_time,*
FROM sys.dm_xe_sessions AS dxs;

 Output
-----------------
hkenginexesession
system_health
sp_server_diagnostics session
telemetry_xevents

As you can see our Extended Event session is not returned because it is not in a running state

Now lets's query the sys.server_event_sessions DMV and check if our Extended Event session is returned

-- this query returns also Extended Event sessions that are not currently running
 SELECT *
 FROM sys.server_event_sessions 

 Output
-----------------
system_health
AlwaysOn_health
telemetry_xevents
ProcsExecutions

As you can see our  Extended Event session is returned even though it is not in a running state

If we now start the session again and then check the sys.dm_xe_sessions DMV, we will get back out session

-- start the session again
   ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = START; 
   GO


SELECT dxs.name,
dxs.create_time,*
FROM sys.dm_xe_sessions AS dxs;

 Output
-----------------
hkenginexesession
system_health
sp_server_diagnostics session
telemetry_xevents
ProcsExecutions


So now our Extended Event session is returned because it is in a running state

Instead of this query to check for the existence

IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
DROP EVENT SESSION ProcsExecutions ON SERVER
GO


What we really want is this

IF EXISTS (SELECT name FROM sys.server_event_sessions  WHERE name = 'ProcsExecutions')
DROP EVENT SESSION ProcsExecutions ON SERVER
GO

So basically we change the dmv from sys.dm_xe_sessions to sys.server_event_sessions in IF EXISTS check

So it is a pretty easy change, just swap out the DMV



If you want to stop a session if it is running, you can go ahead and implement something like this

 IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
 BEGIN
 PRINT 'The Session Was Running'

 ALTER EVENT SESSION ProcsExecutions
   ON SERVER  
   STATE = STOP; 
END

IF NOT EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'ProcsExecutions')
 PRINT 'The Session is NOT Running'


That's all for this post, hopefully it will be useful to someone




Saturday, April 13, 2019

How to improve your tech skills

Today we are going to look at how to improve your tech skills. This really is a continuation of the Stay relevant and marketable post from a couple of weeks ago. Here are some things that you can do to improve your tech skills


Attend usergroups

Attend your local usergroup meetings, there is always some expert that comes to do presentations.


Answer questions

I still think answering questions is one of the best ways to improve your skill. Join a QA site like stackoverflow, head on to a specialized site on stackexchange, here is a list of all of them http://stackexchange.com/sites



If you are not comfortable with answering yet or if you realize that the questions are too difficult, don't worry about, just start by lurking. What you will find out over time is that every month you will be able to answer more and more of these question. This is because the questions are pretty much the same but some little detail might be different.  After a while you will notice that there will be very few questions that you won't be able to answer in your field of expertise


Lunch and learns

No time you say to improve your skills, do you take lunch breaks? If so consider doing lunch and learns, get into a conference room, fire up the projector and then either look at code with the team, do design, watch videos, whatever floats your boat


Get involved with an open source project

A good way to improve your skills is to get involved with an open source project. Pick a project download it, then pick it apart. Start reading through the code, notice how things are done, ask yourself why it was done that way. Would you do it the same way? If you pick a big enough project, there will be many contributors, can you tell that the code was put together or does it pretty much look like it was written by one person. Are standards followed, how many design patterns are used


Read books, read code, read blogs

There are many classic list of books that every programmer should read
Here is just a small list that you can choose from, I grabbed this from stackoverflow

Code Complete (2nd edition) by Steve McConnell
The Pragmatic Programmer
Design Patterns by the Gang of Four
Refactoring: Improving the Design of Existing Code
The Mythical Man Month
Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin
CODE by Charles Petzold
Working Effectively with Legacy Code by Michael C. Feathers
Peopleware by Demarco and Lister
Coders at Work by Peter Seibel
Patterns of Enterprise Application Architecture by Martin Fowler
Test-Driven Development: By Example by Kent Beck
Practices of an Agile Developer
Don't Make Me Think
Agile Software Development, Principles, Patterns, and Practices by Robert C. Martin
Domain Driven Designs by Eric Evans
The Design of Everyday Things by Donald Norman
JavaScript - The Good Parts
Getting Real by 37 Signals
The Annotated Turing
Agile Principles, Patterns, and Practices in C# by Robert C. Martin
The Soul of a New Machine by Tracy Kidder
Here Comes Everybody: The Power of Organizing Without Organizations by Clay Shirky
Pragmatic Unit Testing in C# with NUnit by Andy Hunt and Dave Thomas with Matt Hargett
Rework by Jason Freid and DHH
JUnit in Action

Reading code is also a good way to improve your skills, head over to the frameworks you use the most and start digging around in the API, look at the example code.
Read blogs of subject expert, study their code and techniques, if something is not clear don't hesitate to leave a comment asking for some info or further explanation


Practice by doing katas

If you have ever done Karate you will know what a kata is, it is basically the practice of forms. A kata, or code kata, is defined as an exercise in programming which helps hone your skills through practice and repetition. Dave Thomas, started this movement for programming. You can find a list of awesome katas here: https://github.com/gamontal/awesome-katas


Blog

I found that blogging has been very good for my tech skills. It keeps me sharp and since I blog about new things it keeps my skill set up to date. When blogging, your readers will tell you when the code is wrong, so you have to make sure everything is tested and will run as shown in the post. Since you will have to do some research when writing these blog posts, your skills will improve and expand.
An added bonus is that I have a code library that I can access anytime I want.


Write a book

If you are a masochistic type of person then I recommend you write a book, almost everybody in the tech world that I know swore that they would never write a book again when they were done.......and yet they did. In order to write a book you have to spend a LOT of time doing research, making sure your code is correct and much more. Once you are done with this if you were not a subject expert you will be now. The worst part of writing a book is the initial feedback you get pointing out all your mistakes, if you are not thick skinned this could become a problem.


Listen to podcast, watch webinars

I get a lot of my tech info from podcasts, I like it better than listening to music at times and it makes the commute or run more enjoyable. The benefit is that you will learn something, you also might hear about some new shiny thing and then you will want to check it out when you get to the computer. There are many things I have learned from podcast, I also look forward to the next episode


Monday, March 18, 2019

Stay relevant and marketable



It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.


Whenever I interview people for a DB position, I always ask what the latest version is that they have used. I will also ask if they have used the version that is in beta now. More often than not I will get an answer that they are still running something that is two versions behind. Then I ask if they have installed the latest and greatest on their home computer. The answer to that question is usually no.

This to me is crazy, surely you can't be in this field just for the money, where is the passion for discovering new things? Imagine you are a person who makes tables, after a while you will become a master and there is not much more for you to learn. The curse and the blessing of technology is that it is changing, and it is changing rapidly. If you don't invest time after work, before work and over the weekend to discover new things, play with the newest versions and sharpen your skill you will become obsolete, there are many people like that, they fall apart during the interview process.


But I have no time to do all this additional work

People will complain that they don't have enough time to do these additional things. Here are some things you can do if you are short on time. If commuting by public transportation read a book or download the latest papers about the newest versions of the product and read them. If you workout try listening to podcast while doing your aerobic exercises. Perhaps you can set the speed on your player to be 30% faster, this way you can get more podcasts in the same amount of time.

Ask yourself what will help you advance, having up to date skill or being up to date with the latest episodes of Billions, Narcos or American Gods? Another option is to watch the shows on the train on a tablet and then do the tech stuff at home.
Attend an launch event or go to your local usergroup meeting, there are usually sessions on the latest and greatest versions of the software. Youtube also has tons of free sessions.

Here are the Amazon AWS re:Invent 2018 breakout sessions https://www.youtube.com/user/AmazonWebServices/playlists?shelf_id=33&view=50&sort=dd

The Microsoft Ignite sessions can be found here:
https://www.youtube.com/channel/UCrhJmfAGQ5K81XQ8_od1iTg

The Google Developer sessions can be found here:
https://www.youtube.com/user/GoogleDevelopers

Database related sessions: Azure SQL Data Warehouse, Azure SQL Database and SQL Server sessions from the Build 2018 conference

Do you eat lunch? If so,  instead of going out for lunch, bring your lunch to the office. While eating your lunch at your desk, watch some of the sessions listed above or read some documentation or books about technologies that you are interested in. If you prefer to read and you want to learn about SQl Server Execution Plans? Head on over to Hugo Kornelis' SQL Server Execution Plan Reference site


Step outside your comfort zone

Try out some other things, become a polyglot programmer. If you are a Java developer, give Scala or Clojure a try. If you are a .NET developer then try out F#, IronPython or Boo. If you are a SQL Server guy why not start playing around with Oracle, PostgreSQL or perhaps even a flavor of NoSQL, take a look at MongoDB, CouchDB, Cassandra and other solutions.

Have you looked at Big Data? This is already getting big and it will only get bigger. Ever heard of Hadoop? No, heard of facebook? Sure you have, guess what, facebook claims to have the biggest Hadoop cluster, It is over 100 Petabytes and it grows by about half a Petabyte per day, and you thought your database was big :-)

Trends are cyclical, every 10 years or so something new and big comes along. In the late 90s this was data warehousing, OLAP cubes, dimensions, fact tables. Every company these days does some sort of data warehousing. Now it is big data, data science, AI and machine learning that is new and shiny. Take a look at some of that stuff


Finally have you looked at the cloud yet? If you have not... I beg you to take a look. 
If you think that you need to have money to get started, this is not true


You can start with the free tiers to get some experience. All 3 major could providers offer free tiers

Here are the links to these free tiers in order of cloud vendor by market share

Amazon Webservices
https://aws.amazon.com/free/


Microsoft Azure
https://azure.microsoft.com/en-us/free/

Google Cloud 
https://cloud.google.com/free/


That's it for this post.... happy learning.....

Saturday, March 16, 2019

Use sys.configurations to find features that are enabled but perhaps not used



It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.


Today we are going to look at servers where everything is installed and enabled. Before we start this post let's look back in time a little. Before SQL Server 2005 came out when you installed SQL Server pretty much everything was turned on by default. This of course widened the attack vector against the database servers. With SQL Server 2005 pretty much everything is turned off and you have to turn the features on if you want to use them. Now sometimes some admins will just turn everything on because that way they don't have to deal with this later, these are also the same kind of people who insist that the account needs to be db_owner otherwise their code won't work.


To see what these features are and if they are turned off, you can use the following query.


SELECT name, value,value_in_use 
FROM sys.configurations
WHERE name IN (
'Agent XPs',
'SQL Mail XPs',
'Database Mail XPs',
'SMO and DMO XPs',
'Ole Automation Procedures',
'SQL Mail XPs',
'external scripts enabled',
'Web Assistant Procedures',
'xp_cmdshell',
'Ad Hoc Distributed Queries',
'hadoop connectivity',
'polybase enabled',
'Replication XPs',
'clr enabled')

Here is what the output looks like on my laptop

sys.configurations output


The difference between value and value_in_use is that value_in_use is what is currently used and value will be used next time the server is restarted. If you want to have the change take effect immediately then use RECONFIGURE

As you can see xp_cmdshell is turned on

Here is an example that will turn off xp_cmdshell



EXECUTE sp_configure 'show advanced options', 1
RECONFIGURE
GO
 
EXECUTE sp_configure 'xp_cmdshell', '0'
RECONFIGURE
GO
 
EXECUTE sp_configure 'show advanced options', 0
RECONFIGURE
GO



To enable a feature use the value 1, to disable a feature use the value 0

If you prefer the GUI, you can also use that, right click on the database server name in SSMS, select Facets, from the Facet drop down select Surface Area Configuration. You will see the following Surface Area Configuration properties


Here you can enable or disable the features you are interested in. You can also export these properties as a policy to use on other servers so that the features are the same on all your servers


Installing everything by default

Here I have mixed feelings myself about what to do. On one hand I don't like to install SSAS or SSRS if there is no need for it, on the other hand I don't feel like adding that stuff 6 months down the road if there is suddenly a need for it. If I do install it, I make sure it at least doesn't run by default but it is disabled.

There is no benefit in having SSAS, SSRS or SSIS running and using CPU cycles as well as RAM if nobody is using these services. If you do install it and nobody uses it, disable the services, you will have more RAM and CPU cycles for the SQL Server service available.


One more thing I want to mention is that I have run into something like this many times.... The Dev or Test server has everything enabled. We deploy to production and oops...  CLR is not enabled (quick fix) or SSRS/SSAS is not installed (need to install it, will take longer)

So make sure before deploying that you check what is enabled and installed on the production box



Friday, January 04, 2019

Proactive notifications

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.


 Today we are going to look at proactive notifications

In the SQL Server Maintenance post from yesterday I touched upon proactive notifications a little, today I want to dive a little deeper into this subject.
The last thing you want to hear as a DBA is  ny of the following things from the end users or developers
  • The transaction log is full
  • The database is very slow
  • The latest backup we have is 9 days old
  • The table that was created has 2 extra columns this morning
  • Everything is locked up can't get any results back from a query
  • Deadlocks are occurring
What you really want to have at your shop is a tool like Quest FoglightSolarwindsRed Gate SQL Monitor or similar. The benefit of these tools is that there is a central location where you can look at all the alerts at a glance. You get a lot of stuff out of the box and all you have to do is tell it what server to start monitoring. I would suggest to start using the trial version to see if it is something that would be beneficial for your organization.
Of course you can roll your own solution as well, this will involve work and unless your time is worthless or you are bored out of your mind after work I wouldn't do it.

Utilize the logs

You need to scan the errorlog periodically to see if there are errors, you can automate this, no need to start opening log files every 5 minutes. create a SQL Agent job that runs every 5 minutes and checks if there are any errors since it last ran. You can use the xp_readerrorlog proc to read the error log from with sql server with T-SQL.
Here is a small example of what you can do if you have this in a SQL Agent job that runs every 5 minutes or so, you can of course email yourself the results, dump the result into a table that is perhaps shown on a dashboard in the office, there are many possibilities.


--This will hold the rows
CREATE TABLE #ErrorLog (LogDate datetime, ProcessInfo VarChar(10), 
ErrorMessage VarChar(Max))

-- Dump the errorlog into the table
INSERT INTO #ErrorLog
EXEC master.dbo.xp_readerrorlog

-- Delete everything older than 5 minutes
-- ideally you will store the max date when it ran last
DELETE #ErrorLog
WHERE LogDate <  DATEADD(mi,-5,GETDATE())

-- Some stuff you want to check for
-- Failed backups...you want to know this
SELECT * FROM #ErrorLog
WHERE ErrorMessage LIKE'BACKUP failed%'

-- Why does it take so looong to grow a file, maybe rethink your settings
SELECT * FROM #ErrorLog
WHERE ErrorMessage LIKE'Autogrow of file%'

-- What is going on any backups or statistic updates running at this time?
SELECT * FROM #ErrorLog
WHERE ErrorMessage LIKE'SQL Server has encountered %occurrence(s) of I/O requests taking longer than%'

-- My mirror might not be up to date
SELECT * FROM #ErrorLog
WHERE ErrorMessage LIKE'The alert for ''unsent log'' has been raised%'


DROP TABLE #ErrorLog 

Those are just small samples, you might want to look for other kind of messages from the errorlog

The transaction log is full

You want to make sure that you know you are running out of space before you run out of space. I covered this in the SQL Server Maintenance post Take a look at the sections Make sure that you have enough space left on the drives and Make sure that you have enough space left for the filegroups In those two section I described what to look for and also supplied code that you can then plug into your own solution

The database is very slow

This complaint you hear every now and then, I have seen this from time to time. There are several things that could be happening, here is a list
Someone decided to take a backup of that 1 TB database in the middle of the day
The update statistics job is still running
Statistics are stale and haven't been updated in a long time
The virus scan is running amok and nobody told it to ignore the database files
Someone decided to query all the data all at once
If you have a tool like Quest FoglightConfio IgniteRed Gate SQL Monitor or similar then you can see what query ran at what time, what it did and how long it ran.
You can of course also use sp_who2, BlkBy column and DBCC INPUTBUFFER to see what is going on
If you like to use Dynamic Management Views, then take a look at Glenn Berry's SQL Server 2005 Diagnostic Information Queries  posts, there is a .sql file in each post with all kind of queries to discover all kinds of stuff about your server.
It could also be that your hardware is having issues, make sure the IOs look good and check the eventlog for any clues.

The latest backup we have is 9 days old

The following query will give you for all the databases the last time it was backed up or display NEVER if it wasn't backed up

SELECT s.Name AS DatabaseName,'Database backup was taken on  ' + 
CASE WHEN MAX(b.backup_finish_date) IS NULL THEN 'NEVER!!!' ELSE
CONVERT(VARCHAR(12), (MAX(b.backup_finish_date)), 101) END AS LastBackUpTime
FROM sys.sysdatabases s
LEFT OUTER JOIN msdb.dbo.backupset b ON b.database_name = s.name
GROUP BY s.Name

Here is what the output will look like

DatabaseName LastBackUpTime
--------------  ---------------------------------------
model         Database backup was taken on  NEVER!!!
msdb         Database backup was taken on  12/10/2012
ReportServer Database backup was taken on  NEVER!!!


As you can see that is not that great, all the databases should be backed up on a regular basis. Scroll up to the Utilize the logs section to see how you can check the errorlog for failed backup messages.

Everything is locked up, you can't get any results back from a query

Usually this indicates that there is an open transaction somewhere that has not finished or someone did the BEGIN TRAN part but never did a COMMIT or ROLLBACK.
Some people just restart the server to 'fix' the issue, of course if you do that you will never know what the root cause is and you never know when it will happen again.
We can easily show what happens when you have an open transaction, btw don't do this on the production server.

In 1 query window run this, replace SomeTable with a real table name.

BEGIN TRAN

SELECT TOP 1 * FROM SomeTable WITH(UPDLOCK, HOLDLOCK)

You will get a message that the query completed successfully
In another window run this

SELECT TOP 1 * FROM SomeTable WITH(UPDLOCK, HOLDLOCK)

That query won't return anything unless the first one is commited or rolled back

Now run this query below, the first column should have the text AWAITING COMMAND

SELECT   sys.cmd
        ,sys.last_batch
        ,lok.resource_type
        ,lok.resource_subtype
        ,DB_NAME(lok.resource_database_id)
        ,lok.resource_description
        ,lok.resource_associated_entity_id
        ,lok.resource_lock_partition
        ,lok.request_mode
        ,lok.request_type
        ,lok.request_status
        ,lok.request_owner_type
        ,lok.request_owner_id
        ,lok.lock_owner_address
        ,wat.waiting_task_address
        ,wat.session_id
        ,wat.exec_context_id
        ,wat.wait_duration_ms
        ,wat.wait_type
        ,wat.resource_address
        ,wat.blocking_task_address
        ,wat.blocking_session_id
        ,wat.blocking_exec_context_id
        ,wat.resource_description
FROM    sys.dm_tran_locks lok
JOIN    sys.dm_os_waiting_tasks wat
ON      lok.lock_owner_address = wat.resource_address
JOIN sys.sysprocesses sys ON wat.blocking_session_id = sys.spid


As you can see you have a blocking_session_id and a session_id, this will tell you which session_id is being blocked. You can now verify that the transaction session_id is blocking the other id
Go back to that first command window and execute a rollback

ROLLBACK

The query that had that second select should now be done as well, if you run that query that checks for the waits it should be clean as well.
Of course you could have done the same exercise by running sp_who2, looking at the BlkBy column, finding out what that session is doing by running DBCC INPUTBUFFER(session_id) with that session_id

Deadlocks are occurring

There is already a post written on LessThanDot explaining how you can get emailed when deadlocks occur. Ted Krueger wrote that post and it can be found here: Proactive Deadlock Notifications

Summary

I only touched the surface of what can be done in this post. I want you to find out if there is any monitoring being done in your shop, who gets notified? I have worked in places where the end user was the proactive notification, as long as we fixed it before the business users started to complaint life was good. Manual notifications and homebrew solutions might work for a while but when you add more and more servers and you add more people to the team this becomes laborious and error prone.

Thursday, January 03, 2019

Cursors and loops

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.


 Today we are going to look at cursors

Why do we hate those poor cursors?

Let's first see why people tend to use cursors. Let's say you come from a procedural language and this is the first time you are using SQL. In the procedural language you know how to traverse a list, you of course will look for something that is similar in SQL........bingo!!! you found it...the almighty cursor....the crusher of almost all SQL Server performance. You start using it, your code works, you are happy, life is good.

Now a team member tells you that the cursor is evil and should never ever be used. You are confused, if a cursor is never to be used then why is it part of the language? Well you might say the same for the GOTO statement, this exists in SQL.  Edsger Dijkstra's letter Go To Statement Considered Harmful was published in the March 1968 Communications of the ACM.

The reason that cursors are evil is that they tend to be slower than a set based solution. Cursors are not needed for 99% of the cases. SQL is a set based language, it works best with sets of data, not row by row processing, when you do something set based it will generally perform hundreds of times faster than using a cursor.

Take a look at this code

CREATE FUNCTION dbo.cursorEnroll ()
    RETURNS INT AS
    BEGIN
        DECLARE @studentsEnrolled INT
        SET @studentsEnrolled = 0
        DECLARE myCursor CURSOR FOR
            SELECT enrollementID
                FROM courseEnrollment
        OPEN myCursor;
 
        FETCH NEXT FROM myCursor INTO @studentsEnrolled
 
        WHILE @@FETCH_STATUS = 0
            BEGIN
                SET @studentsEnrolled = @studentsEnrolled+1
                    FETCH NEXT FROM myCursor INTO @studentsEnrolled
            END;
        CLOSE myCursor
        RETURN @studentsEnrolled
 
    END;

That whole flawed cursor logic can be replaced with one line of T-SQL

SELECT @studentsEnrolled = COUNT(*) FROM courseEnrollment

Which one do you think will perform faster?

What is more evil than a cursor?

If cursors are evil, then what is more evil than a cursor? Nested cursors of course, especially three nested cursors. Here is an example of some horrible code where a cursor is not needed

CREATE PROCEDURE [dbo].[sp_SomeMadeUpName]
AS

DECLARE @SomeDate DATETIME

SET @SomeDate =  CONVERT(CHAR(10),getDate(),112)

EXEC sp_createSomeLinkedServer @SomeDate,@SomeDate,12

DECLARE @sql NVARCHAR(2000), @SomeID VARCHAR(20)



DECLARE SomeCursor CURSOR 
FOR
SELECT DISTINCT SomeID
FROM SomeTable
WHERE getDate() BETWEEN SomeStart and SomeEnd


OPEN SomeCursor

FETCH NEXT FROM SomeCursor INTO @SomeID

WHILE @@FETCH_STATUS = 0

 BEGIN
 
 PRINT @SomeID
 
 SET @sql = ''
 SET @sql = @sql + N'DECLARE @Date DATETIME, @Value FLOAT' + char(13) + char(13)
 SET @sql = @sql + N'DECLARE curData CURSOR FOR' + char(13)
 SET @sql = @sql + N'SELECT * ' + char(13)
 SET @sql = @sql + N'FROM OPENQUERY(LinkedServerName,''SELECT Date,' + RTRIM(@SomeID) + ' FROM SomeTable'')' + char(13) + char(13)
 SET @sql = @sql + N'OPEN curData' + char(13) + char(13)
 SET @sql = @sql + N'FETCH NEXT FROM curData INTO @Date,@Value' + char(13)
 SET @sql = @sql + N'WHILE @@FETCH_STATUS = 0' + char(13)
 SET @sql = @sql + N'BEGIN' + char(13)
 SET @sql = @sql + N'INSERT INTO SomeTAble' + char(13)
 SET @sql = @sql + N'VALUES(''' + @SomeID + ''',@Date,@Value)' + char(13)
 SET @sql = @sql + N'FETCH NEXT FROM curData INTO @Date,@Value' + char(13)
 SET @sql = @sql + N'END' + char(13)
 SET @sql = @sql + N'CLOSE curData' + char(13) + char(13)
 SET @sql = @sql + N'DEALLOCATE curData' 

 PRINT @sql + char(13) + char(13)

 EXEC sp_ExecuteSQL @sql

 FETCH NEXT FROM SomeCursor INTO @SomeID

 END

CLOSE SomeCursor
DEALLOCATE SomeCursor


Why the need of looping over the list of IDs? Join with the linked server and do all this in 3 lines of code

I have seen some really horrible code with nested cursors, one example I saw was when someone needed to sum up some data, he created 3 nested cursor..the first one to loop over years, the second to loop over months and the third to loop over days.  This thing ran forever

All you need is to do a simple group by... for example

;WITH cte AS (SELECT DATEADD(dd,number,'20190101') AS TheDate,number
FROM master..spt_values WHERE type = 'p')

SELECT SUM(number) as Value,
YEAR(TheDAte)as TheYear, 
MONTH(TheDate) AS TheMonth 
FROM cte
GROUP BY YEAR(TheDate), MONTH(TheDate)
ORDER BY TheYear,TheMonth

Output


Replacing one evil with another

If you are using while loops instead of cursors then you really have not replaced the cursor at all, you are still not doing a set based operation, everything is still going row by row. Aaron Bertrand has a good post here, no need for me to repeat the same Bad Habits to Kick : Thinking a WHILE loop isn't a CURSOR

Loops in triggers

Usually you will find a loop in a trigger that will call some sort of stored procedures that needs to perform some kind of task for each row affected by the trigger. In the Triggers, when to use them, when not to use them post I already explained how to handle this scenario

Loops in stored procedures

Here is where you will find the biggest offenders. All you have to do to find the procs with cursors is run the following piece of code

SELECT * FROM sys.procedures 
WHERE OBJECT_DEFINITION((object_id) )LIKE '%DECLARE%%cursor%'

For while loops, just change the '%DECLARE%%cursor%' part to '%while%'
Look at those procs and investigate if you can rewrite them using a SET based operation instead

Loops for maintenance purposes

One of the few times I will use cursors or while loops is if I need to get information about tables or databases where I have to get information from a stored procedure.
Here is an example

CREATE TABLE #tempSpSpaceUsed (TableName VARCHAR(100),
        Rows INT,
        Reserved VARCHAR(100),
        Data VARCHAR(100),
        IndexSize VARCHAR(100),
        Unused VARCHAR(100))
        
        
SELECT name, ROW_NUMBER() OVER(ORDER BY name) AS id
INTO #temp
FROM sys.tables 

DECLARE @TableName VARCHAR(300)
DECLARE @loopid INT = 1, @maxID int = (SELECT MAX(id) FROM #temp)
WHILE @loopid <= @maxID
BEGIN

SELECT @TableName = name FROM #temp WHERE id = @loopid
INSERT #tempSpSpaceUsed
 EXEC('EXEC sp_spaceused ''' + @TableName + '''')
SET @loopid +=1
END

SELECT * FROM #tempSpSpaceUsed
 
Of course this can be simplified as well, you can just run this instead of the while loop and run the output

SELECT DISTINCT 'INSERT #tempSpSPaceUsed
EXEC sp_spaceused ''' + [name] + ''''
FROM sys.tables

Finally if you need more cursor goodness, take a look at these three posts which are also mentioned in the wiki article