SQL Server Code,Tips and Tricks, Performance Tuning: Gotcha

Showing posts with label Gotcha. Show all posts

Monday, March 15, 2021

After 20+ years in IT .. I finally discovered this useful command

Very similar to my After 20+ years in IT .. I finally discovered this... post, I discovered yet another command I should have know about

Originally I was not going to write this post, but after I found out that several other people didn't know this, I figured what the heck, why not, maybe someone will think this is cool as well

Open up a command prompt or powershell command window, , navigate to a folder, type in tree... hit enter

Here is what I see

I was watching a Pluralsight course and the person typed in the tree command.. and I was like whoaaaa.. How do I not know this? Perhaps maybe because I don't use the command window all that much? Anyway I thought that this was pretty cool

As you can see tree list all the directories and sub directories in a tree like structure. This is great to quickly see all the directories in one shot

The same command will work in Powershell

Finally, here is an image of the command window output as well as the explorer navigation pane side by side

Hopefully this will be useful to someone

Thursday, February 14, 2019

Finding rows where the column starts or ends with a 'bad' character

A coworker came to me asking me for some help. He had some issues trying to convert some data from a staging table to numeric. I asked him to show me the data in SSMS and at first glance it looked good to me. Then I asked where the data came from, he said it came from Excel.

Aha... I have plenty of war stories with Excel so I said, it's probably some non printable character that is in the column.. either a tab (char(9)) or a non breaking space (char(160))..especially if the value was copied from the internet

He said isnumeric was returning 0 for rows that looked valid, I then told him to run this query on those rows

SELECT ASCII(LEFT(SomeColumn,1)),
 ASCII(RIGHT(SomeColumn,1)),* 
FROM StagingData s

That would give them the ascii numerical value. For example a tab is 9, linefeed = 10....

Here is a chart for the characters between 0 and 32

Binary	Oct	Dec	Hex	Abbreviation			^[b]	^[c]	^[d]	Name (1967)
Binary	Oct	Dec	Hex	1963	1965	1967	^[b]	^[c]	^[d]	Name (1967)
000 0000	000	0	00	NULL	NUL		␀	`^@`	`\0`	Null
000 0001	001	1	01	SOM	SOH		␁	`^A`		Start of Heading
000 0010	002	2	02	EOA	STX		␂	`^B`		Start of Text
000 0011	003	3	03	EOM	ETX		␃	`^C`		End of Text
000 0100	004	4	04	EOT			␄	`^D`		End of Transmission
000 0101	005	5	05	WRU	ENQ		␅	`^E`		Enquiry
000 0110	006	6	06	RU	ACK		␆	`^F`		Acknowledgement
000 0111	007	7	07	BELL	BEL		␇	`^G`	`\a`	Bell
000 1000	010	8	08	FE0	BS		␈	`^H`	`\b`	Backspace^[e]^[f]
000 1001	011	9	09	HT/SK	HT		␉	`^I`	`\t`	Horizontal Tab^[g]
000 1010	012	10	0A	LF			␊	`^J`	`\n`	Line Feed
000 1011	013	11	0B	VTAB	VT		␋	`^K`	`\v`	Vertical Tab
000 1100	014	12	0C	FF			␌	`^L`	`\f`	Form Feed
000 1101	015	13	0D	CR			␍	`^M`	`\r`	Carriage Return^[h]
000 1110	016	14	0E	SO			␎	`^N`		Shift Out
000 1111	017	15	0F	SI			␏	`^O`		Shift In
001 0000	020	16	10	DC0	DLE		␐	`^P`		Data Link Escape
001 0001	021	17	11	DC1			␑	`^Q`		Device Control 1 (often XON)
001 0010	022	18	12	DC2			␒	`^R`		Device Control 2
001 0011	023	19	13	DC3			␓	`^S`		Device Control 3 (often XOFF)
001 0100	024	20	14	DC4			␔	`^T`		Device Control 4
001 0101	025	21	15	ERR	NAK		␕	`^U`		Negative Acknowledgement
001 0110	026	22	16	SYNC	SYN		␖	`^V`		Synchronous Idle
001 0111	027	23	17	LEM	ETB		␗	`^W`		End of Transmission Block
001 1000	030	24	18	S0	CAN		␘	`^X`		Cancel
001 1001	031	25	19	S1	EM		␙	`^Y`		End of Medium
001 1010	032	26	1A	S2	SS	SUB	␚	`^Z`		Substitute
001 1011	033	27	1B	S3	ESC		␛	`^[`	`\e`^[i]	Escape^[j]
001 1100	034	28	1C	S4	FS		␜	`^\`		File Separator
001 1101	035	29	1D	S5	GS		␝	`^]`		Group Separator
001 1110	036	30	1E	S6	RS		␞	`^^`^[k]		Record Separator
001 1111	037	31	1F	S7	US		␟	`^_`		Unit Separator

Source: https://en.wikipedia.org/wiki/ASCII

He then ran the following to grab all the rows that ended or started with tabs

SELECT * FROM StagingData s
WHERE LEFT(SomeColumn,1)  = char(9)
OR  RIGHT(SomeColumn,1)  = char(9)

So let's take another look at this to see how we can make this a little better

Let's create a table that will hold these bad characters that we don't want, in my case ACII values 1 untill 32

Here is what we will do to create and populate the table

CREATE TABLE BadCharacters(
  BadChar char(1) NOT NULL, 
  ASCIINumber int NOT NULL,
   CONSTRAINT pk_BadCharacters 
  PRIMARY KEY CLUSTERED( BadChar )
 )

GO

INSERT BadCharacters
SELECT char(number),number
FROM master..SPT_VALUES
WHERE type = 'P'
AND number BETWEEN 1 AND 32
OR number = 160

A quick look at the data looks like this

SELECT * FROM BadCharacters

Now let's create our staging table and insert some data so that we can do some tests

CREATE TABLE StagingData (SomeColumn varchar(255) )

INSERT StagingData
SELECT CONVERT(VARCHAR(10),s1.number) + '.' + CONVERT(VARCHAR(10),s2.number)
FROM master..SPT_VALUES s1
CROSS JOIN master..SPT_VALUES s2
WHERE s1.type = 'P'
AND s2.type = 'P'

That inserted 4194304 rows on my machine

Time to insert some of that bad data

Here is what some of the data inserted will look like

2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8

And this is the query to generated and insert those bad rows, on my machine it generated 1089 such rows

;WITH cte as(SELECT CONVERT(VARCHAR(10),number) as num
FROM master..SPT_VALUES
WHERE type = 'P'
AND number BETWEEN 1 AND 1000)

--INSERT StagingData
SELECT  b.BadChar + c1.num + '.' + c2.num + b2.BadChar
FROM cte c1
CROSS JOIN cte c2
JOIN BadCharacters b on c1.num = b.ASCIINumber
JOIN BadCharacters b2 on c2.num = b2.ASCIINumber

The query create a value by using a bad value, a number a dot a number and a bad value, you can see those values above

Now it's time to find these bad rows, but before we do that, let's add an index

CREATE INDEX ix_StagingData on StagingData(SomeColumn)

OK, we are ready...

Of course I here you saying, why don't we just do this

SELECT * FROM StagingData
WHERE TRY_CONVERT(numeric(20,10),SomeColumn) IS NULL

Well, yes that gives me everything that can't be converted to numeric, but I want to see what those characters are

Before we start, let's set statistics io on so that we can look at some performance

SET STATISTICS IO ON
GO

Here are the queries to find the bad characters at the start

SELECT * FROM StagingData s
JOIN BadCharacters b on b.BadChar = LEFT(s.SomeColumn,1)



SELECT * FROM StagingData s
JOIN BadCharacters b on s.SomeColumn like b.BadChar +'%'

Here is what the reads look like

(1089 row(s) affected)
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StagingData'. Scan count 9, logical reads 10851, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1089 row(s) affected)
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StagingData'. Scan count 33, logical reads 135, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

As you can see from the stats, the top query is non-SARGable and generates a lot more reads, the bottom query can use the index. Always make sure to write your queries in a way so that SQL Server can you an index

What about the last character, how can we find those

SELECT * FROM StagingData s
JOIN BadCharacters b on b.BadChar = RIGHT(s.SomeColumn,1)


SELECT * FROM StagingData s
JOIN BadCharacters b on s.SomeColumn like +'%' + b.BadChar

Here are the stats again

(1089 row(s) affected)
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StagingData'. Scan count 9, logical reads 10851, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1089 row(s) affected)
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StagingData'. Scan count 33, logical reads 445863, physical reads 0, read-ahead reads 13, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So both of these queries suck the life out of your SQL Server instance, so what can be done?

One thing we can do is add a computed column to the table that will hold just the last character of the column, then we can index the computed column

Here are the commands to do that

ALTER TABLE StagingData ADD RightChar as RIGHT(SomeColumn,1)
GO


CREATE INDEX ix_RightChar on StagingData(RightChar)
GO

And now we can just run the same queries again

SELECT * FROM StagingData s
JOIN BadCharacters b on b.BadChar = RIGHT(s.SomeColumn,1)


SELECT * FROM StagingData s
JOIN BadCharacters b on s.SomeColumn like +'%' + b.BadChar

Here are the stats

(1089 row(s) affected)
Table 'StagingData'. Scan count 33, logical reads 1223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1089 row(s) affected)
Table 'StagingData'. Scan count 33, logical reads 1223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Did you expect to get the same exact reads for both queries?

So what is going on? Well lets take a look

In both cases, the optimizer was smart enough to use the index on the computed column

Hopefully this will make someone's life easier and you can expand the table to add other character you consider bad. You can also add constraint to reject values or you can add triggers and then move those bad rows to a bad rows table

Finally if you need to worry about unicode you might want to change the table to be nvarchar

Enjoy.. importing that data..... we all know..it's only getting bigger and bigger

Monday, February 04, 2019

After 20+ years in IT .. I finally discovered this...

Originally I was not going to write this post, but after I found out that several other people didn't know this I figured what the heck, why not, maybe this will help someone else as well

Last week I was on a remote session with 2 clients, each run the Advent program . The team I am part of provides a script to run the advent (APX or Axys) executable. This will then generate the portfolios, composites, price, security master, splits and other files. We then zip it up and sftp it over for ingestion so that we can run analytics and attribution

During these calls I interact with system administrators because usually the need to give permissions so that the script runs correctly

None of these admins knew that what I will show you existed. All the co-workers I asked didn't know this either (This could be because they are developers and not admins)

Back in the day (win 98 or perhaps NT 4), there was a windows powertool that you could install and if you right clicked on a folder you would get an option to open a command window and it would be in the path that you right clicked on

Those power tools don't exist anymore and you could do the same by hacking the registry, it's like a 16 step process

But there is a faster way.....

So what I usually did before 2 months ago is that I would select the path

And then I would open a command prompt, type CD and then paste the path...not too complicated

But here is the faster way.... instead of copying the path...just type in cmd in the address bar and hit enter

Boom shakalaka... a command prompt is opened immediately and you are in the same path

Did you know this also works when you type Powershell in the address bar, Eric Darling left me a comment on twitter informing me that it works with powershell as well

Here is what you see after typing it

So there you have it... hopefully it will save you some minutes of valuable time in a year

Also if you knew about this or did not know..leave a comment and let me know

Tuesday, January 08, 2019

Print.. the disruptor of batch deletes in SQL

Someone had an issue where a batched delete script was not deleting anything. I looked over some code in our repository and noticed two patterns the way queries are written to handle batch deletes

One is a while loop that runs while @@rowcount is greater than 0

WHILE @@rowcount > 0
 BEGIN
  DELETE TOP (5000)
  FROM SomeTable
 END

The other way is to run a while loop which is always true and then check if @@rowcount is 0, if it is 0 then break out of the loop

 WHILE 1 = 1  
    BEGIN  
        DELETE TOP(5000)  
        FROM SomeTable

        IF @@ROWCOUNT = 0  
     BREAK  
     END

I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts

Let's take a look at an example. This is a simplified example without a where clause..but let's say you have to delete several million rows from a table with many more millions of rows and the table is replicated... in that case you want to batch the deletes so that your log file doesn't fill up, replication has a chance to catch up and in general the deletes should run faster

SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId,
  getutcdate() AS SomeDate, newid() AS SomeValue
INTO SomeTable
FROM sys.sysobjects t1
CROSS JOIN sys.sysobjects t2

SELECT COUNT(*) from SomeTable 

SELECT * FROM SomeTable WHERE 1= 0

WHILE @@rowcount > 0
 BEGIN
  DELETE TOP (5000)
  FROM SomeTable
 END

SELECT COUNT(*) from SomeTable

DROP TABLE SomeTable -- Added here as cleanup in case people run the example

This is of course a silly example because why would you do a count like that against a different table before a delete

But what if you had this instead, you put a nice print statement there so that from the output you see when it started and you would also see the rowcounts?

SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId,
  getutcdate() AS SomeDate, newid() AS SomeValue
INTO SomeTable
FROM sys.sysobjects t1
CROSS JOIN sys.sysobjects t2

SELECT COUNT(*) from SomeTable 

PRINT' Starting my update now....'

WHILE @@rowcount > 0
 BEGIN
  DELETE TOP (5000)
  FROM SomeTable
 END

SELECT COUNT(*) from SomeTable

DROP TABLE SomeTable -- Added here as cleanup in case people run the example

The count is 20000 before and after the loop, nothing got delete, this is because a print statement will reset @@rowcount to 0.

Take a look by running this simple set of queries

SELECT 1 UNION ALL SELECT 2
SELECT @@rowcount as 'Rowcount'
PRINT '1'
SELECT @@rowcount as 'RowcountAfterPrint'

Here is what the output looks like

After the PRINT line @@rowcount is reset back to 0

So if you want to use a while loop while checking @@rowcount, do this instead by running the delete first once outside the loop

SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId,
  getutcdate() AS SomeDate, newid() AS SomeValue
INTO SomeTable
FROM sys.sysobjects t1
CROSS JOIN sys.sysobjects t2

SELECT COUNT(*) from SomeTable 

PRINT' Starting my update now....'

DELETE TOP (5000)
FROM SomeTable

WHILE @@rowcount > 0
 BEGIN
  DELETE TOP (5000)
  FROM SomeTable
 END

SELECT COUNT(*) from SomeTable

DROP TABLE SomeTable -- Added here as cleanup in case people run the example

If you run the delete this way if there was something to delete, the while loop would be entered, if the table was empty then there would no need to enter the while loop

Also keep in mind that it is not just PRINT that will reset @@rowcount back to 0.

From Books On Line:

Statements such as USE, SET <option>, DEALLOCATE CURSOR, CLOSE CURSOR, BEGIN TRANSACTION, or COMMIT TRANSACTION reset the ROWCOUNT value to 0.

That's all... hopefully this helps someone out in the future if they notice nothing gets deleted

Wednesday, September 26, 2018

A little less hate for: String or binary data would be truncated in table

The error String or binary data would be truncated is one of the more annoying things in SQL Server. You would insert some data and you would get back the error String or binary data would be truncated. Then you would have to spend a good amount of time to see what caused the error.

I even posted about this as part of T-SQL Tuesday #86: String or binary data would be truncated

I read the SQL Server 2019 CTP 2 whitepaper and on page 17 it has the following

Improve truncation message for ETL DW scenarios—the error message ID 8152 String or binary data would be truncated is familiar to many SQL Server developers and administrators who develop or maintain data movement workloads; the error is raised during data transfers between a source and a destination with different schemas when the source data is too large to fit into the destination data type. This error message can be time-consuming to troubleshoot because of its lack of specificity. SQL Server 2019 introduces a new, more specific error message for this scenario: ID 2628 String or binary data would be truncated in table '%.*ls', column '%.*ls'. Truncated value: '%.*ls'. The new error message provides more context for the problem, simplifying the troubleshooting process. So that it cannot break existing applications parsing message ID 8152, this new message ID 2628 is an opt-in replacement, which can be enabled with trace flag 460.

Oh really... they fixed this? Let's take a look

First I downloaded SQL Server 2019 CTP 2 and installed it. Then I created a database with a simple table, I also inserted some data that wouldn't fit

CREATE DATABASE truncatetest
GO

USE truncatetest
GO

CREATE TABLE TruncateMe(somevalue varchar(5),somevalue2 varchar(5), somedecimal decimal(5,2))
GO

INSERT TruncateMe
SELECT '333333','444444',5.3

I then received the following error message, so this is the same as in SQL Server 2018 and earlier, notice message id 8152

Msg 8152, Level 16, State 30, Line 10
String or binary data would be truncated.
The statement has been terminated.

To enable the new functionality, we need to enable trace flag 460, you can do that by running the DBCC TRACEON command like this

DBCC TRACEON(460)

Now let's try that insert statement again

INSERT TruncateMe
SELECT '333333','444444',5.3

And there we go, you get the table name, the column name as well as the value, notice that the message id changed from 8152 to 2628 now

Msg 2628, Level 16, State 1, Line 20
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue'. Truncated value: '33333'.
The statement has been terminated.

So it looks it only returns the first value that generates the error, let's change the first value to fit into the column and execute the insert statement again

INSERT TruncateMe
SELECT '3','444444',5.3

Now you will see that the error is for the somevalue2 column

Msg 2628, Level 16, State 1, Line 27
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue2'. Truncated value: '44444'.
The statement has been terminated.

What will happen if you have more than one row that fails?

insert TruncateMe
select  '333333','444444',5.3 union all
select '3','444444',5.3

Here is the error

Msg 2628, Level 16, State 1, Line 37
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue'. Truncated value: '33333'.
The statement has been terminated.

As you can see the error is only for the first row, not the second one

What about Table Variables, will you also get an error with the column and value like for real tables?

declare  @table table (somevalue varchar(5),somevalue2 varchar(5), somedecimal decimal(5,2))


insert @table
select '333333','444444',5.3

Here is the error

Msg 2628, Level 16, State 1, Line 53
String or binary data would be truncated in table 'tempdb.dbo.#A6AD698B', column 'somevalue'. Truncated value: '33333'.

As you can see you also get the error, the table name is the internal table name for the table variable tied to your session

What about Table Valued Parameters?

CREATE TYPE TestTypeTrunc AS TABLE   
( somevalue varchar(5),somevalue2 varchar(5));  
GO

DECLARE @table TestTypeTrunc
INSERT @table
SELECT '555555555','444444'

Here is the error for that

Msg 2628, Level 16, State 1, Line 58
String or binary data would be truncated in table 'tempdb.dbo.#AC6642E1', column 'somevalue'. Truncated value: '55555'.

Pretty much the same message as for the table variable

I do applaud Microsoft for finally fixing this.

Here it is also in beautiful technicolor