Thursday, January 03, 2019

Cursors and loops

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.


 Today we are going to look at cursors

Why do we hate those poor cursors?

Let's first see why people tend to use cursors. Let's say you come from a procedural language and this is the first time you are using SQL. In the procedural language you know how to traverse a list, you of course will look for something that is similar in SQL........bingo!!! you found it...the almighty cursor....the crusher of almost all SQL Server performance. You start using it, your code works, you are happy, life is good.

Now a team member tells you that the cursor is evil and should never ever be used. You are confused, if a cursor is never to be used then why is it part of the language? Well you might say the same for the GOTO statement, this exists in SQL.  Edsger Dijkstra's letter Go To Statement Considered Harmful was published in the March 1968 Communications of the ACM.

The reason that cursors are evil is that they tend to be slower than a set based solution. Cursors are not needed for 99% of the cases. SQL is a set based language, it works best with sets of data, not row by row processing, when you do something set based it will generally perform hundreds of times faster than using a cursor.

Take a look at this code

CREATE FUNCTION dbo.cursorEnroll ()
    RETURNS INT AS
    BEGIN
        DECLARE @studentsEnrolled INT
        SET @studentsEnrolled = 0
        DECLARE myCursor CURSOR FOR
            SELECT enrollementID
                FROM courseEnrollment
        OPEN myCursor;
 
        FETCH NEXT FROM myCursor INTO @studentsEnrolled
 
        WHILE @@FETCH_STATUS = 0
            BEGIN
                SET @studentsEnrolled = @studentsEnrolled+1
                    FETCH NEXT FROM myCursor INTO @studentsEnrolled
            END;
        CLOSE myCursor
        RETURN @studentsEnrolled
 
    END;

That whole flawed cursor logic can be replaced with one line of T-SQL

SELECT @studentsEnrolled = COUNT(*) FROM courseEnrollment

Which one do you think will perform faster?

What is more evil than a cursor?

If cursors are evil, then what is more evil than a cursor? Nested cursors of course, especially three nested cursors. Here is an example of some horrible code where a cursor is not needed

CREATE PROCEDURE [dbo].[sp_SomeMadeUpName]
AS

DECLARE @SomeDate DATETIME

SET @SomeDate =  CONVERT(CHAR(10),getDate(),112)

EXEC sp_createSomeLinkedServer @SomeDate,@SomeDate,12

DECLARE @sql NVARCHAR(2000), @SomeID VARCHAR(20)



DECLARE SomeCursor CURSOR 
FOR
SELECT DISTINCT SomeID
FROM SomeTable
WHERE getDate() BETWEEN SomeStart and SomeEnd


OPEN SomeCursor

FETCH NEXT FROM SomeCursor INTO @SomeID

WHILE @@FETCH_STATUS = 0

 BEGIN
 
 PRINT @SomeID
 
 SET @sql = ''
 SET @sql = @sql + N'DECLARE @Date DATETIME, @Value FLOAT' + char(13) + char(13)
 SET @sql = @sql + N'DECLARE curData CURSOR FOR' + char(13)
 SET @sql = @sql + N'SELECT * ' + char(13)
 SET @sql = @sql + N'FROM OPENQUERY(LinkedServerName,''SELECT Date,' + RTRIM(@SomeID) + ' FROM SomeTable'')' + char(13) + char(13)
 SET @sql = @sql + N'OPEN curData' + char(13) + char(13)
 SET @sql = @sql + N'FETCH NEXT FROM curData INTO @Date,@Value' + char(13)
 SET @sql = @sql + N'WHILE @@FETCH_STATUS = 0' + char(13)
 SET @sql = @sql + N'BEGIN' + char(13)
 SET @sql = @sql + N'INSERT INTO SomeTAble' + char(13)
 SET @sql = @sql + N'VALUES(''' + @SomeID + ''',@Date,@Value)' + char(13)
 SET @sql = @sql + N'FETCH NEXT FROM curData INTO @Date,@Value' + char(13)
 SET @sql = @sql + N'END' + char(13)
 SET @sql = @sql + N'CLOSE curData' + char(13) + char(13)
 SET @sql = @sql + N'DEALLOCATE curData' 

 PRINT @sql + char(13) + char(13)

 EXEC sp_ExecuteSQL @sql

 FETCH NEXT FROM SomeCursor INTO @SomeID

 END

CLOSE SomeCursor
DEALLOCATE SomeCursor


Why the need of looping over the list of IDs? Join with the linked server and do all this in 3 lines of code

I have seen some really horrible code with nested cursors, one example I saw was when someone needed to sum up some data, he created 3 nested cursor..the first one to loop over years, the second to loop over months and the third to loop over days.  This thing ran forever

All you need is to do a simple group by... for example

;WITH cte AS (SELECT DATEADD(dd,number,'20190101') AS TheDate,number
FROM master..spt_values WHERE type = 'p')

SELECT SUM(number) as Value,
YEAR(TheDAte)as TheYear, 
MONTH(TheDate) AS TheMonth 
FROM cte
GROUP BY YEAR(TheDate), MONTH(TheDate)
ORDER BY TheYear,TheMonth

Output


Replacing one evil with another

If you are using while loops instead of cursors then you really have not replaced the cursor at all, you are still not doing a set based operation, everything is still going row by row. Aaron Bertrand has a good post here, no need for me to repeat the same Bad Habits to Kick : Thinking a WHILE loop isn't a CURSOR

Loops in triggers

Usually you will find a loop in a trigger that will call some sort of stored procedures that needs to perform some kind of task for each row affected by the trigger. In the Triggers, when to use them, when not to use them post I already explained how to handle this scenario

Loops in stored procedures

Here is where you will find the biggest offenders. All you have to do to find the procs with cursors is run the following piece of code

SELECT * FROM sys.procedures 
WHERE OBJECT_DEFINITION((object_id) )LIKE '%DECLARE%%cursor%'

For while loops, just change the '%DECLARE%%cursor%' part to '%while%'
Look at those procs and investigate if you can rewrite them using a SET based operation instead

Loops for maintenance purposes

One of the few times I will use cursors or while loops is if I need to get information about tables or databases where I have to get information from a stored procedure.
Here is an example

CREATE TABLE #tempSpSpaceUsed (TableName VARCHAR(100),
        Rows INT,
        Reserved VARCHAR(100),
        Data VARCHAR(100),
        IndexSize VARCHAR(100),
        Unused VARCHAR(100))
        
        
SELECT name, ROW_NUMBER() OVER(ORDER BY name) AS id
INTO #temp
FROM sys.tables 

DECLARE @TableName VARCHAR(300)
DECLARE @loopid INT = 1, @maxID int = (SELECT MAX(id) FROM #temp)
WHILE @loopid <= @maxID
BEGIN

SELECT @TableName = name FROM #temp WHERE id = @loopid
INSERT #tempSpSpaceUsed
 EXEC('EXEC sp_spaceused ''' + @TableName + '''')
SET @loopid +=1
END

SELECT * FROM #tempSpSpaceUsed
 
Of course this can be simplified as well, you can just run this instead of the while loop and run the output

SELECT DISTINCT 'INSERT #tempSpSPaceUsed
EXEC sp_spaceused ''' + [name] + ''''
FROM sys.tables

Finally if you need more cursor goodness, take a look at these three posts which are also mentioned in the wiki article


No comments:

Post a Comment