Monday, May 14, 2007

Make Your Case Sensitive Searches 1000 Times Faster

I had an case sensitive update query that ran in about 9 minutes. Since it was a case sensitive query it did an index scan not an index seek.
Once I modified my WHERE clause the update took a little less than 3 seconds

Let's get started and see what I did

First create this table

CREATE TABLE #CaseSensitiveSearchTemp (Val CHAR(1))

INSERT #CaseSensitiveSearchTemp VALUES('A')
INSERT #CaseSensitiveSearchTemp VALUES('B')
INSERT #CaseSensitiveSearchTemp VALUES('C')
INSERT #CaseSensitiveSearchTemp VALUES('D')
INSERT #CaseSensitiveSearchTemp VALUES('E')
INSERT #CaseSensitiveSearchTemp VALUES('F')
INSERT #CaseSensitiveSearchTemp VALUES('G')
INSERT #CaseSensitiveSearchTemp VALUES('H')

Now we will insert some lowercase characters

INSERT #CaseSensitiveSearchTemp
SELECT LOWER(Val) FROM #CaseSensitiveSearchTemp

Now we will create our real table which will have 65536 rows

CREATE TABLE CaseSensitiveSearch (Val VARCHAR(50))

We will do a couple of cross joins to generate the data for our queries

INSERT CaseSensitiveSearch
SELECT t1.val + t2.val + t3.val + t4.val
FROM #CaseSensitiveSearchTemp t1
CROSS JOIN #CaseSensitiveSearchTemp t2
CROSS JOIN #CaseSensitiveSearchTemp t3
CROSS JOIN #CaseSensitiveSearchTemp t4


This should give you 65536 rows

SELECT * FROM CaseSensitiveSearch


Create an index on the table

CREATE INDEX IX_SearchVal ON CaseSensitiveSearch(Val)


This is how you do a case sensitive search

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS



Now hit CRLK + K (SQL Server 2000) or CRLK + M(SQL Server 2005)

run these 2 queries in one batch by highlighting them both and hitting F5

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD'

Look at the execution plan, I get 98.71% for the first query and 1.29% for the second query. Just by adding the AND condition SQL server is able to do an index seek and run the query many times faster




Now try it with a lowercase a

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'aBCD' COLLATE SQL_Latin1_General_CP1_CS_AS

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'aBCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'aBCD'


You see it all works without a problem, the correct result is returned

4 comments:

Anonymous said...

Interesting!

Further testing shows they have the same result

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'aBCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'aBCD'

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'aBCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD' -- this LIKE is case insensitive

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'aBCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val = 'ABCD' -- case insensitive

Jon Kruger said...

Wow, that is amazing... I would've never guessed that a minor tweak like that would make such a difference.

Chase Saunders said...

Your example confused me because I thought you were trying to create a lookup table that would apply to many different cases. Once I got past this, I realized this is a well-known optimization.

No reason to use the LIKE operator, all that is needed is to perform the equality test once with collation and once without. See example 2 at:

http://vyaskn.tripod.com/case_sensitive_search_in_sql_server.htm

Chase Saunders said...

Not sure the pasted link worked in previous comment; seeing if I can get an HTML comment to work...

Try This Link