Thursday, January 03, 2019

Cursors and loops

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.


 Today we are going to look at cursors

Why do we hate those poor cursors?

Let's first see why people tend to use cursors. Let's say you come from a procedural language and this is the first time you are using SQL. In the procedural language you know how to traverse a list, you of course will look for something that is similar in SQL........bingo!!! you found it...the almighty cursor....the crusher of almost all SQL Server performance. You start using it, your code works, you are happy, life is good.

Now a team member tells you that the cursor is evil and should never ever be used. You are confused, if a cursor is never to be used then why is it part of the language? Well you might say the same for the GOTO statement, this exists in SQL.  Edsger Dijkstra's letter Go To Statement Considered Harmful was published in the March 1968 Communications of the ACM.

The reason that cursors are evil is that they tend to be slower than a set based solution. Cursors are not needed for 99% of the cases. SQL is a set based language, it works best with sets of data, not row by row processing, when you do something set based it will generally perform hundreds of times faster than using a cursor.

Take a look at this code

CREATE FUNCTION dbo.cursorEnroll ()
    RETURNS INT AS
    BEGIN
        DECLARE @studentsEnrolled INT
        SET @studentsEnrolled = 0
        DECLARE myCursor CURSOR FOR
            SELECT enrollementID
                FROM courseEnrollment
        OPEN myCursor;
 
        FETCH NEXT FROM myCursor INTO @studentsEnrolled
 
        WHILE @@FETCH_STATUS = 0
            BEGIN
                SET @studentsEnrolled = @studentsEnrolled+1
                    FETCH NEXT FROM myCursor INTO @studentsEnrolled
            END;
        CLOSE myCursor
        RETURN @studentsEnrolled
 
    END;

That whole flawed cursor logic can be replaced with one line of T-SQL

SELECT @studentsEnrolled = COUNT(*) FROM courseEnrollment

Which one do you think will perform faster?

What is more evil than a cursor?

If cursors are evil, then what is more evil than a cursor? Nested cursors of course, especially three nested cursors. Here is an example of some horrible code where a cursor is not needed

CREATE PROCEDURE [dbo].[sp_SomeMadeUpName]
AS

DECLARE @SomeDate DATETIME

SET @SomeDate =  CONVERT(CHAR(10),getDate(),112)

EXEC sp_createSomeLinkedServer @SomeDate,@SomeDate,12

DECLARE @sql NVARCHAR(2000), @SomeID VARCHAR(20)



DECLARE SomeCursor CURSOR 
FOR
SELECT DISTINCT SomeID
FROM SomeTable
WHERE getDate() BETWEEN SomeStart and SomeEnd


OPEN SomeCursor

FETCH NEXT FROM SomeCursor INTO @SomeID

WHILE @@FETCH_STATUS = 0

 BEGIN
 
 PRINT @SomeID
 
 SET @sql = ''
 SET @sql = @sql + N'DECLARE @Date DATETIME, @Value FLOAT' + char(13) + char(13)
 SET @sql = @sql + N'DECLARE curData CURSOR FOR' + char(13)
 SET @sql = @sql + N'SELECT * ' + char(13)
 SET @sql = @sql + N'FROM OPENQUERY(LinkedServerName,''SELECT Date,' + RTRIM(@SomeID) + ' FROM SomeTable'')' + char(13) + char(13)
 SET @sql = @sql + N'OPEN curData' + char(13) + char(13)
 SET @sql = @sql + N'FETCH NEXT FROM curData INTO @Date,@Value' + char(13)
 SET @sql = @sql + N'WHILE @@FETCH_STATUS = 0' + char(13)
 SET @sql = @sql + N'BEGIN' + char(13)
 SET @sql = @sql + N'INSERT INTO SomeTAble' + char(13)
 SET @sql = @sql + N'VALUES(''' + @SomeID + ''',@Date,@Value)' + char(13)
 SET @sql = @sql + N'FETCH NEXT FROM curData INTO @Date,@Value' + char(13)
 SET @sql = @sql + N'END' + char(13)
 SET @sql = @sql + N'CLOSE curData' + char(13) + char(13)
 SET @sql = @sql + N'DEALLOCATE curData' 

 PRINT @sql + char(13) + char(13)

 EXEC sp_ExecuteSQL @sql

 FETCH NEXT FROM SomeCursor INTO @SomeID

 END

CLOSE SomeCursor
DEALLOCATE SomeCursor


Why the need of looping over the list of IDs? Join with the linked server and do all this in 3 lines of code

I have seen some really horrible code with nested cursors, one example I saw was when someone needed to sum up some data, he created 3 nested cursor..the first one to loop over years, the second to loop over months and the third to loop over days.  This thing ran forever

All you need is to do a simple group by... for example

;WITH cte AS (SELECT DATEADD(dd,number,'20190101') AS TheDate,number
FROM master..spt_values WHERE type = 'p')

SELECT SUM(number) as Value,
YEAR(TheDAte)as TheYear, 
MONTH(TheDate) AS TheMonth 
FROM cte
GROUP BY YEAR(TheDate), MONTH(TheDate)
ORDER BY TheYear,TheMonth

Output


Replacing one evil with another

If you are using while loops instead of cursors then you really have not replaced the cursor at all, you are still not doing a set based operation, everything is still going row by row. Aaron Bertrand has a good post here, no need for me to repeat the same Bad Habits to Kick : Thinking a WHILE loop isn't a CURSOR

Loops in triggers

Usually you will find a loop in a trigger that will call some sort of stored procedures that needs to perform some kind of task for each row affected by the trigger. In the Triggers, when to use them, when not to use them post I already explained how to handle this scenario

Loops in stored procedures

Here is where you will find the biggest offenders. All you have to do to find the procs with cursors is run the following piece of code

SELECT * FROM sys.procedures 
WHERE OBJECT_DEFINITION((object_id) )LIKE '%DECLARE%%cursor%'

For while loops, just change the '%DECLARE%%cursor%' part to '%while%'
Look at those procs and investigate if you can rewrite them using a SET based operation instead

Loops for maintenance purposes

One of the few times I will use cursors or while loops is if I need to get information about tables or databases where I have to get information from a stored procedure.
Here is an example

CREATE TABLE #tempSpSpaceUsed (TableName VARCHAR(100),
        Rows INT,
        Reserved VARCHAR(100),
        Data VARCHAR(100),
        IndexSize VARCHAR(100),
        Unused VARCHAR(100))
        
        
SELECT name, ROW_NUMBER() OVER(ORDER BY name) AS id
INTO #temp
FROM sys.tables 

DECLARE @TableName VARCHAR(300)
DECLARE @loopid INT = 1, @maxID int = (SELECT MAX(id) FROM #temp)
WHILE @loopid <= @maxID
BEGIN

SELECT @TableName = name FROM #temp WHERE id = @loopid
INSERT #tempSpSpaceUsed
 EXEC('EXEC sp_spaceused ''' + @TableName + '''')
SET @loopid +=1
END

SELECT * FROM #tempSpSpaceUsed
 
Of course this can be simplified as well, you can just run this instead of the while loop and run the output

SELECT DISTINCT 'INSERT #tempSpSPaceUsed
EXEC sp_spaceused ''' + [name] + ''''
FROM sys.tables

Finally if you need more cursor goodness, take a look at these three posts which are also mentioned in the wiki article


Tuesday, January 01, 2019

SQL Server Maintenance

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.

In this post  we are going to look at SQL Server maintenance

Just like with a car or a house, you need to do maintenance on databases as well. SQL Server has gotten better over the years, there are less knobs you need to turn out of the box but maintenance is still required.


In this post I will be looking at some stuff that you need to be aware of. Some of the things I will mention can be thought of as maintenance as well as regular checks. Think of a DBA as a car mechanic, instead of an oil change, tune up or checking the tire pressure, the DBA will check index fragmentation, run DBCC CHECKDB and make sure you have enough space for the database to grow for the next predetermined period.

The things I will cover in this post are: fragmentation of indexes, free drives space, space in filegroups, running DBCC CHECKDB and finally making sure that you have the latest source code of your objects in a source control system.

Check fragmentation of indexes

A lot of time your index will get fragmented over time if you do a lot of updates or inserts and deletes.

Now instead of rolling your own solution, you should take a look at some of them that are out there and used by many people. Take a look at SQL Server Index and Statistics Maintenance by Ola Hallengren, You can also get the scripts from Github here: https://github.com/olahallengren/sql-server-maintenance-solution


Check that your database is healthy by running DBCC CHECKDB

What does DBCC CHECKDB do? Here is the explanation from Books On Line
Checks the logical and physical integrity of all the objects in the specified database by performing the following operations:

  • Runs DBCC CHECKALLOC on the database.
  • Runs DBCC CHECKTABLE on every table and view in the database.
  • Runs DBCC CHECKCATALOG on the database.
  • Validates the contents of every indexed view in the database.
  • Validates link-level consistency between table metadata and file system directories and files when storing varbinary(max) data in the file system using FILESTREAM.
  • Validates the Service Broker data in the database.
So how frequent should you be running DBCC CHECKDB? Ideally you should be running DBCC CHECKDB as frequent as possible, do you want to find out that there is corruption when it is very difficult to fix since two weeks have passed or do you want to find out the same day so that you can fix the table immediately.

Paul Randal who worked on DBCC CHECKDB has a whole bunch of blog posts about DBCC CHECKDB, the posts can be found here http://www.sqlskills.com/blogs/paul/category/checkdb-from-every-angle.aspx

Make sure that you have enough space left on the drives

Running out of space on a drive is not fun stuff, suddenly you can't insert any more data into your tables because no new pages can be allocated. If you have tools in your shop like cacti then this is probably already monitored. If you don't have any tools then either get a tool or roll your own. Here is how you can get the free space fo the drives with T-SQL


CREATE TABLE #FixedDrives(Drive CHAR(1),MBFree INT)

INSERT #FixedDrives
EXEC xp_fixeddrives

SELECT * FROM #FixedDrives


Here is the output for one of my servers

Drive MBFree
------------------
C 6916  -- System
D 28921 -- Apps
L 52403 -- Log
M 4962  -- System databases
T 86208 -- Temps
U 71075 -- User databases 
V 212075-- User databases 


Here is a simple way of using T-SQL to create a SQL Agent job that runs every 10 minutes and will send an email if you go below the threshold that you specified. This code is very simple and is just to show you that you can do this in T-SQL. You can make it more dynamic/configurable by not hard-coding the drives or thresholds

DECLARE @MBFreeD INT
DECLARE @MBFreeE INT
CREATE TABLE #FixedDrives(Drive CHAR(1),MBFree INT)

INSERT #FixedDrives
EXEC xp_fixeddrives

SELECT @MBFreeD =  MBFree
FROM #FixedDrives
WHERE DRIVE = 'D'

SELECT @MBFreeE =  MBFree
FROM #FixedDrives
WHERE DRIVE = 'E'


DROP TABLE #FixedDrives

IF @MBFreeD < 30000 OR @MBFreeE < 10000
BEGIN
      DECLARE @Recipients VARCHAR(8000)
   SELECT @Recipients ='SomeGroup@SomeEmail.com'
       
  DECLARE @p_body AS NVARCHAR(MAX), @p_subject AS NVARCHAR(MAX), @p_profile_name AS NVARCHAR(MAX)

  SET @p_subject = @@SERVERNAME + N'  Drive Space is running low'
  SET @p_body = ' Drive Space is running low <br><br><br>' + CHAR(13) + CHAR(10) + 'Drive D has ' 
  + CONVERT(VARCHAR(20),@MBFreeD) + ' MB left <br>' + CHAR(13) + CHAR(10) + 'Drive E has ' 
  + CONVERT(VARCHAR(20),@MBFreeE) + ' MB left'

  EXEC msdb.dbo.sp_send_dbmail
     @recipients = @Recipients,
     @body = @p_body,
     @body_format = 'HTML',
     @subject = @p_subject
END

Make sure that you have enough space left for the filegroups

In the Sizing database files I talked about the importance of sizing database files. Just like you can run out of hard drive space, you can also fill up a file used by SQL Server. here is query that will tell you how big the file is, how much space is use and how much free space is left. You can use a query like this to alert you before you run out of space


SELECT
 a.FILEID,
 [FILE_SIZE_MB] = 
  CONVERT(DECIMAL(12,2),ROUND(a.size/128.000,2)),
 [SPACE_USED_MB] =
  CONVERT(DECIMAL(12,2),ROUND(FILEPROPERTY(a.name,'SpaceUsed')/128.000,2)),
 [FREE_SPACE_MB] =
  CONVERT(DECIMAL(12,2),ROUND((a.size-FILEPROPERTY(a.name,'SpaceUsed'))/128.000,2)) ,
 NAME = LEFT(a.NAME,35),
 FILENAME = LEFT(a.FILENAME,60)
FROM
 dbo.sysfiles a

Have the latest scripts of all your objects

You might say that you have all the code for your objects in the database. What if you want to go back to the version of the proc from 3 days ago, is it really easier to restore a 800 GB backup from 3 days ago just to get the stored proc code?

Of course not, make sure that you have DDL scripts of every object in source control, your life will be much easier.
I only touched on a couple of points here, some of the things mentioned here will also show up in the proactive notifications post in a couple of days. There is much more to maintenance than this, keep informed and make sure you understand what needs to be done.


Some more repos for you to use


Take a look at the GitHub repositories mentioned in this post: Five great SQL Server GitHub repos that every SQL Server person should check out  There are some good ones like dbatools and tigertoolbox

Friday, November 23, 2018

Happy Fibonacci day, here is how to generate a Fibonacci sequence in SQL


Image by Jahobr - Own work, CC0, Link


Since today is Fibonacci day I decided to to a short post about how to do generate a Fibonacci sequence in T-SQL. But first let's take a look at what a Fibonacci sequence actually is.

In mathematics, the Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones:

 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

Often, especially in modern usage, the sequence is extended by one more initial term:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

November 23 is celebrated as Fibonacci day because when the date is written in the mm/dd format (11/23), the digits in the date form a Fibonacci sequence: 1,1,2,3.

So here is how you can generate a Fibonacci sequence in SQL, you can do it by using s recursive table expression.  Here is what it looks like if you wanted to generate the Fibonacci sequence to up to a value of 1 million

;WITH Fibonacci (Prev, Next) as
(
     SELECT 1, 1
     UNION ALL
     SELECT Next, Prev + Next
     FROM Fibonacci
     WHERE Next < 1000000
)
SELECT Prev as Fibonacci
     FROM Fibonacci
     WHERE Prev < 1000000




That will generate a Fibonacci sequence that starts with 1, if you need a Fibonacci sequence that start with 0, all you have to do is replace the 1 to 0 in the first select statement

;WITH Fibonacci (Prev, Next) as
(
     SELECT 1, 1
     UNION ALL
     SELECT Next, Prev + Next
     FROM Fibonacci
     WHERE Next < 1000000
)
SELECT Prev as Fibonacci
     FROM Fibonacci
     WHERE Prev < 1000000


Here is what it looks like in SSMS



Happy Fibonacci day!!

I created the same for PostgreSQL, the only difference is that you need to add the keyword RECURSIVE in the CTE, here is that post  Happy Fibonacci day, here is how to generate a Fibonacci sequence in PostgreSQL

Wednesday, September 26, 2018

A little less hate for: String or binary data would be truncated in table


The error String or binary data would be truncated is one of the more annoying things in SQL Server. You would insert some data and you would get back the error String or binary data would be truncated. Then you would have to spend a good amount of time to see what caused the error.

I even posted about this as part of  T-SQL Tuesday #86: String or binary data would be truncated

I read the SQL Server 2019 CTP 2 whitepaper and on page 17 it has the following

Improve truncation message for ETL DW scenarios—the error message ID 8152 String or binary data would be truncated is familiar to many SQL Server developers and administrators who develop or maintain data movement workloads; the error is raised during data transfers between a source and a destination with different schemas when the source data is too large to fit into the destination data type. This error message can be time-consuming to troubleshoot because of its lack of specificity. SQL Server 2019 introduces a new, more specific error message for this scenario: ID 2628 String or binary data would be truncated in table '%.*ls', column '%.*ls'. Truncated value: '%.*ls'. The new error message provides more context for the problem, simplifying the troubleshooting process. So that it cannot break existing applications parsing message ID 8152, this new message ID 2628 is an opt-in replacement, which can be enabled with trace flag 460. 


Oh really... they fixed this? Let's take a look

First I downloaded SQL Server 2019 CTP 2 and installed it. Then I created a database with a simple table, I also inserted some data that wouldn't fit

CREATE DATABASE truncatetest
GO

USE truncatetest
GO

CREATE TABLE TruncateMe(somevalue varchar(5),somevalue2 varchar(5), somedecimal decimal(5,2))
GO

INSERT TruncateMe
SELECT '333333','444444',5.3

I then received the following error message, so this is the same as in SQL Server 2018 and earlier, notice message id 8152

Msg 8152, Level 16, State 30, Line 10
String or binary data would be truncated.
The statement has been terminated.

To enable the new functionality, we need to enable trace flag 460, you can do that by running the DBCC TRACEON command like this

DBCC TRACEON(460)

Now let's try that insert statement again

INSERT TruncateMe
SELECT '333333','444444',5.3

And there we go, you get the table name, the column name as well as the value, notice that the message id changed from 8152 to 2628 now

Msg 2628, Level 16, State 1, Line 20
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue'. Truncated value: '33333'.
The statement has been terminated.


So it looks it only returns the first value that generates the error, let's change the first value to fit into the column and execute the insert statement again

INSERT TruncateMe
SELECT '3','444444',5.3

Now you will see that the error is for the somevalue2 column


Msg 2628, Level 16, State 1, Line 27
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue2'. Truncated value: '44444'.
The statement has been terminated.



What will happen if you have more than one row that fails?

insert TruncateMe
select  '333333','444444',5.3 union all
select '3','444444',5.3

Here is the error

Msg 2628, Level 16, State 1, Line 37
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue'. Truncated value: '33333'.
The statement has been terminated.

As you can see the error is only for the first row, not the second one

What about Table Variables, will you also get an error with the column and value like for real tables?

declare  @table table (somevalue varchar(5),somevalue2 varchar(5), somedecimal decimal(5,2))


insert @table
select '333333','444444',5.3


Here is the error

Msg 2628, Level 16, State 1, Line 53
String or binary data would be truncated in table 'tempdb.dbo.#A6AD698B', column 'somevalue'. Truncated value: '33333'.

As you can see you also get the error, the table name is the internal table name for the table variable tied to your session


What about Table Valued Parameters?

CREATE TYPE TestTypeTrunc AS TABLE   
( somevalue varchar(5),somevalue2 varchar(5));  
GO

DECLARE @table TestTypeTrunc
INSERT @table
SELECT '555555555','444444'

Here is the error for that 

Msg 2628, Level 16, State 1, Line 58
String or binary data would be truncated in table 'tempdb.dbo.#AC6642E1', column 'somevalue'. Truncated value: '55555'.


Pretty much the same message as for the table variable


I do applaud Microsoft for finally fixing this.

Here it is also in beautiful technicolor