Someone had an issue where a batched delete script was not deleting anything. I looked over some code in our repository and noticed two patterns the way queries are written to handle batch deletes
One is a while loop that runs while @@rowcount is greater than 0
WHILE @@rowcount > 0 BEGIN DELETE TOP (5000) FROM SomeTable END
The other way is to run a while loop which is always true and then check if @@rowcount is 0, if it is 0 then break out of the loop
WHILE 1 = 1 BEGIN DELETE TOP(5000) FROM SomeTable IF @@ROWCOUNT = 0 BREAK END
I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts
Let's take a look at an example. This is a simplified example without a where clause..but let's say you have to delete several million rows from a table with many more millions of rows and the table is replicated... in that case you want to batch the deletes so that your log file doesn't fill up, replication has a chance to catch up and in general the deletes should run faster
SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId, getutcdate() AS SomeDate, newid() AS SomeValue INTO SomeTable FROM sys.sysobjects t1 CROSS JOIN sys.sysobjects t2 SELECT COUNT(*) from SomeTable SELECT * FROM SomeTable WHERE 1= 0 WHILE @@rowcount > 0 BEGIN DELETE TOP (5000) FROM SomeTable END SELECT COUNT(*) from SomeTable
DROP TABLE SomeTable -- Added here as cleanup in case people run the example
This is of course a silly example because why would you do a count like that against a different table before a delete
But what if you had this instead, you put a nice print statement there so that from the output you see when it started and you would also see the rowcounts?
SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId, getutcdate() AS SomeDate, newid() AS SomeValue INTO SomeTable FROM sys.sysobjects t1 CROSS JOIN sys.sysobjects t2 SELECT COUNT(*) from SomeTable PRINT' Starting my update now....' WHILE @@rowcount > 0 BEGIN DELETE TOP (5000) FROM SomeTable END SELECT COUNT(*) from SomeTable
DROP TABLE SomeTable -- Added here as cleanup in case people run the example
The count is 20000 before and after the loop, nothing got delete, this is because a print statement will reset @@rowcount to 0.
Take a look by running this simple set of queries
SELECT 1 UNION ALL SELECT 2 SELECT @@rowcount as 'Rowcount' PRINT '1' SELECT @@rowcount as 'RowcountAfterPrint'
After the PRINT line @@rowcount is reset back to 0
So if you want to use a while loop while checking @@rowcount, do this instead by running the delete first once outside the loop
SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId, getutcdate() AS SomeDate, newid() AS SomeValue INTO SomeTable FROM sys.sysobjects t1 CROSS JOIN sys.sysobjects t2 SELECT COUNT(*) from SomeTable PRINT' Starting my update now....' DELETE TOP (5000) FROM SomeTable WHILE @@rowcount > 0 BEGIN DELETE TOP (5000) FROM SomeTable END SELECT COUNT(*) from SomeTable
If you run the delete this way if there was something to delete, the while loop would be entered, if the table was empty then there would no need to enter the while loopDROP TABLE SomeTable -- Added here as cleanup in case people run the example
Also keep in mind that it is not just PRINT that will reset @@rowcount back to 0.
From Books On Line:
Statements such as USE, SET <option>, DEALLOCATE CURSOR, CLOSE CURSOR, BEGIN TRANSACTION, or COMMIT TRANSACTION reset the ROWCOUNT value to 0.
That's all... hopefully this helps someone out in the future if they notice nothing gets deleted
3 comments:
I am struggling to come up with a reason to delete in this manner. The examples you've given all delete every single row from the table, or at least want to. Why not use an unqualified DELETE FROM or TRUNCATE? Hell, sometimes you even drop the table afterwards. To me, that screams temporary table.
This is a simplified example without a where clause..but let's say you have to delete several million rows from a table with many more millions of rows and the table is replicated... in that case you want to batch the deletes
Personally I would never do
WHILE @@rowcount > 0
BEGIN
DELETE TOP (5000)
FROM SomeTable
END
because of the risk that some new code was introduced after the DELETE statement which then interfered with the value of @@ROWCOUNT. What I know of as "defensive programming"
e.g. a PRINT statement, or a total rows deleted count, a conditional abort after X-Minutes have been spent, a WAITFOR to allow other processes to run ... or even some sort of DEBUG statement. IME those sorts of things are routinely added "later on" for one reason or another ... and worth programming to avoid someone (me even!) making a change without realising the side-effects.
Thus my preference would be:
SELECT @MyRowCount = 1 -- Force first iteration
WHILE @MyRowCount > 0
BEGIN
DELETE TOP (5000)
FROM SomeTable
SELECT @MyRowCount = @@ROWCOUNT
END
Post a Comment