Wednesday, May 30, 2007

Speed Up Performance And Slash Your Table Size By 90% By Using Bitwise Logic

You have all seen websites where you can pick a bunch of categories by selection a bunch of check boxes. usually what you do is store those in a lookup table and then you create another table where you store all the categories for each customer.
What if I tell you that you can store all that info in 1 row instead of 10 rows if a customer picked 10 categories.
Take a look at this

1 Classic Rock
2 Hard Rock
4 Speed/Trash Metal
You will store a value of 1 + 2 + 4 = 7(you just sum the values)

Now run this to check, the result will be 7 for a match and some other value otherwise

select 7 | 1,
7 | 2,
7 |3,
7 |4,
7 |5,
7 |6,
7 |7,
7 |8,
7 |20

What is this |(pipe symbol)?
From Books on line
The bitwise operator performs a bitwise logical OR between the two expressions, taking each corresponding bit for both expressions. The bits in the result are set to 1 if either or both bits (for the current bit being resolved) in the input expressions have a value of 1; if neither bit in the input expressions is 1, the bit in the result is set to 0.
The bitwise operator requires two expressions, and it can be used on expressions of only the integer data type category.

Here is how you would typically use this, first create this table

CREATE TABLE NumbersTable (Num int)
INSERT NumbersTable VALUES(1)
INSERT NumbersTable VALUES(2)
INSERT NumbersTable VALUES(3)
INSERT NumbersTable VALUES(4)
INSERT NumbersTable VALUES(5)
INSERT NumbersTable VALUES(6)
INSERT NumbersTable VALUES(7)
INSERT NumbersTable VALUES(8)
INSERT NumbersTable VALUES(9)
INSERT NumbersTable VALUES(10)
INSERT NumbersTable VALUES(11)
INSERT NumbersTable VALUES(12)
GO

Now run this

SELECT
Num,
CASE 7 |Num WHEN 7 THEN 'Yes' ELSE 'No' END AS COL
FROM NumbersTable

Here is the output

Num COL
---- ---
1 Yes
2 Yes
3 Yes
4 Yes
5 Yes
6 Yes
7 Yes
8 No
9 No
10 No
11 No
12 No

Okay enough theory let's start with some SQL code. First create this table which will hold all the categories

CREATE TABLE MusicChoice (ID INT PRIMARY KEY,
ChoiceDescription VARCHAR(100))

INSERT MusicChoice VALUES(1,'Classic Rock')
INSERT MusicChoice VALUES(2,'Hard Rock')
INSERT MusicChoice VALUES(3,'Speed/Trash Metal')
INSERT MusicChoice VALUES(4,'Classical')
INSERT MusicChoice VALUES(5,'Rap')
INSERT MusicChoice VALUES(6,'Blues')
INSERT MusicChoice VALUES(7,'Jazz')
INSERT MusicChoice VALUES(8,'Alternative Rock')
INSERT MusicChoice VALUES(9,'Easy Listening')
INSERT MusicChoice VALUES(10,'Progressive Rock')
INSERT MusicChoice VALUES(11,'Punk Rock')
INSERT MusicChoice VALUES(12,'Swing')
INSERT MusicChoice VALUES(13,'Techno')
INSERT MusicChoice VALUES(14,'Pop')
INSERT MusicChoice VALUES(15,'Disco')
INSERT MusicChoice VALUES(16,'Big Band')
INSERT MusicChoice VALUES(17,'Gospel')
INSERT MusicChoice VALUES(18,'Heavy Metal')
INSERT MusicChoice VALUES(19,'House')
INSERT MusicChoice VALUES(20,'Celtic')
Now create the Bitwise table

CREATE
TABLE BitwiseMusicChoice (ID INT PRIMARY KEY,
ChoiceDescription VARCHAR(100))

We will use the POWER function to create the correct values
run this

SELECT
id,POWER(2,id-1)BitID,ChoiceDescription
FROM MusicChoice

Here is the output
id BitID ChoiceDescription
1 1 Classic Rock
2 2 Hard Rock
3 4 Speed/Trash Metal
4 8 Classical
5 16 Rap
6 32 Blues
7 64 Jazz
8 128 Alternative Rock
9 256 Easy Listening
10 512 Progressive Rock
11 1024 Punk Rock
12 2048 Swing
13 4096 Techno
14 8192 Pop
15 16384 Disco
16 32768 Big Band
17 65536 Gospel
18 131072 Heavy Metal
19 262144 House
20 524288 Celtic

Now insert it into the BitwiseMusicChoice table

INSERT BitwiseMusicChoice
SELECT POWER(2,id-1)BitID,ChoiceDescription
FROM MusicChoice

Now create this customer table

CREATE
TABLE Customer (CustomerID int identity, CustomerCode uniqueidentifier not null)

Insert these 5 values first, we will use these to compare performance later

INSERT Customer VALUES('1DAB5C03-BC23-4FB5-AC3D-A46489459FE9')
INSERT Customer VALUES('F7DDCDBC-F646-493A-B872-4E2E82EA8E14')
INSERT Customer VALUES('E8A4C3D2-AEB0-4821-A49D-3BF085354448')
INSERT Customer VALUES('52581088-C427-4D2F-A782-250564D44D8C')
INSERT Customer VALUES('1B2622C4-6C17-4E74-99D6-336197FBBCFF')

Now we will insert a total of 10000 customers

SET NOCOUNT ON
BEGIN
TRAN
DECLARE
@LoopCounter INT
SET
@LoopCounter = 6
WHILE @LoopCounter <= 10000
BEGIN
INSERT
Customer VALUES(NEWID())
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
WORK
GO

ALTER TABLE Customer ADD CONSTRAINT pk_Customer PRIMARY KEY (CustomerCode)

Create another table to hold the choices

CREATE
TABLE CustomerMusicChoice (id INT identity, MusicChoiceID int, CustomerCode uniqueidentifier)

ALTER TABLE CustomerMusicChoice ADD CONSTRAINT fk_MusicChoice_ID FOREIGN KEY (MusicChoiceID) REFERENCES MusicChoice(ID)

ALTER TABLE CustomerMusicChoice ADD CONSTRAINT fk_CustomerCode FOREIGN KEY (CustomerCode)REFERENCES Customer(CustomerCode)

For each customer insert 10 random choices, this should run less than a minute

SET NOCOUNT ON
BEGIN
TRAN
DECLARE
@LoopCounter INT
DECLARE
@CustID uniqueidentifier
SET
@LoopCounter = 1
WHILE @LoopCounter <= 10000
BEGIN
SELECT
@CustID = CustomerCode
FROM Customer
WHERE CustomerID = @LoopCounter
INSERT Customer VALUES(NEWID())
INSERT CustomerMusicChoice(MusicChoiceID,CustomerCode)
SELECT TOP 10 id,@CustID
FROM MusicChoice
ORDER BY NEWID()
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
WORK
GO

CREATE INDEX ix_CustomerMusicChoice_Cust On CustomerMusicChoice(CustomerCode)

CREATE INDEX ix_CustomerMusicChoice_ID On CustomerMusicChoice(MusicChoiceID)

Create the BitwiseCustomerMusicChoice which will hold the Bitwise values

CREATE TABLE BitwiseCustomerMusicChoice (id INT identity, MusicChoiceID int, CustomerCode uniqueidentifier not null)

This will populate the BitwiseCustomerMusicChoice table

INSERT INTO BitwiseCustomerMusicChoice
SELECT SUM(POWER(2,MusicChoiceID-1)) as MusicChoiceID,CustomerCode
FROM CustomerMusicChoice
GROUP BY CustomerCode

Add the index and foreign key

ALTER TABLE BitwiseCustomerMusicChoice ADD CONSTRAINT pk_BitwiseCustomerMusicChoice PRIMARY KEY (CustomerCode)

ALTER TABLE BitwiseCustomerMusicChoice ADD CONSTRAINT fk_BitwiseCustomerCode FOREIGN KEY (CustomerCode)REFERENCES Customer(CustomerCode)

Now let's test performance. Hit CTRL + K (SQL 2000) or CTRL + M (SQL 2005)

These 2 queries will return something like this

`ID ChoiceDescription Picked8 Alternative Rock No16 Big Band  No6 Blues   No20 Celtic   No1 Classic Rock  No4 Classical  Yes15 Disco   Yes9 Easy Listening  Yes17 Gospel   No2 Hard Rock  No18 Heavy Metal  Yes19 House   Yes7 Jazz   Yes14 Pop   Yes10 Progressive  Rock Yes11 Punk Rock  No5 Rap   No3 Speed/Trash Metal Yes12 Swing   Yes13 Techno   No`

SELECT
mc.ID,ChoiceDescription,CASE WHEN CustomerCode IS NULL THEN 'No' ELSE 'Yes' END Picked
FROM CustomerMusicChoice cmc
RIGHT JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
AND CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
ORDER BY ChoiceDescription

SELECT bmc.ID,ChoiceDescription,
CASE WHEN bmc.ID |MusicChoiceID =MusicChoiceID THEN 'Yes'
ELSE 'No'
END AS Picked
FROM BitwiseCustomerMusicChoice cmc
CROSS JOIN BitwiseMusicChoice bmc
WHERE CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
ORDER BY ChoiceDescription

Look at the execution plan
67.60% against 32.40% not bad right?

Now run this, we will add AND bmc.ID > 0 to both queries. This will change an index scan to an index seek in the bottom query

SELECT mc.ID,ChoiceDescription,CASE WHEN CustomerCode IS NULL THEN 'No' ELSE 'Yes' END Picked
FROM CustomerMusicChoice cmc
RIGHT JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
AND CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
AND mc.ID > 0
ORDER BY ChoiceDescription

SELECT bmc.ID,ChoiceDescription,
CASE WHEN bmc.ID |MusicChoiceID =MusicChoiceID THEN 'Yes'
ELSE 'No'
END AS Picked
FROM BitwiseCustomerMusicChoice cmc
CROSS JOIN BitwiseMusicChoice bmc
WHERE CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
AND bmc.ID > 0
ORDER BY ChoiceDescription

That improved the performance a little. 82.75% against 17.25%

Now look at the tables, after running dbcc showcontig you can see that the BitwiseCustomerMusicChoice is about 1/10th the size of the CustomerMusicChoice table which is as expected.

dbcc showcontig ('BitwiseCustomerMusicChoice')
---------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'BitwiseCustomerMusicChoice' table...
Table: 'BitwiseCustomerMusicChoice' (772197801); index ID: 1, database ID: 26
TABLE level scan performed.
- Pages Scanned................................: 41
- Extents Scanned..............................: 6
- Extent Switches..............................: 5
- Avg. Pages per Extent........................: 6.8
- Scan Density [Best Count:Actual Count].......: 100.00% [6:6]
- Logical Scan Fragmentation ..................: 0.00%
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 48.0
- Avg. Page Density (full).....................: 99.41%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

dbcc showcontig ('CustomerMusicChoice')
---------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'CustomerMusicChoice' table...
Table: 'CustomerMusicChoice' (724197630); index ID: 0, database ID: 26
TABLE level scan performed.
- Pages Scanned................................: 428
- Extents Scanned..............................: 55
- Extent Switches..............................: 54
- Avg. Pages per Extent........................: 7.8
- Scan Density [Best Count:Actual Count].......: 98.18% [54:55]
- Extent Scan Fragmentation ...................: 40.00%
- Avg. Bytes Free per Page.....................: 386.5
- Avg. Page Density (full).....................: 95.22%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

What happens if you want to get the total count of for example Classical?

SELECT COUNT(*)
FROM CustomerMusicChoice cmc
JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
WHERE mc.ChoiceDescription ='Classical'

SELECT COUNT(*)
FROM BitwiseCustomerMusicChoice cmc
JOIN BitwiseMusicChoice bmc ON bmc.ID |MusicChoiceID =MusicChoiceID
WHERE bmc.ChoiceDescription ='Classical'

Here are execution plans for SQl Server 2000 and 2005

As you can see SQL Server 2005 has a bigger difference than SQL Server 2000

Now let's look at the overal picture, on a busy system you will have the customer queries running many times an hour/day. The report queries will run maybe a couple a times a day. I think this trade off is perfectly acceptable because overall your system will perform better. Another thing to keep in mind is that instead of 10 inserts you only have to do 1, same with updates, all these little things add up to a lot eventualy.

So as you can see using bitwise logic is a great way to accomplish a couple of things
Reduce table size
Speed up backup and recovery because your table is much smaller
Improve performance

Of course you have to do some testing for yourself because it might not be appropriate for your design. If your system is more of an OLAP than OLTP type of system then don't bother implementing this since it won't help you.

Cross-posted from SQLBlog! - http://www.sqlblog.com

1 comment:

Alexander Tarasul said...

This is great tip, but one sided - the opinion of performance DBA, but not an opinion of data architect.
Yes the solution will work fine for the data you described. It will stop working when number of categories will reach 32 - the int field just doesn't hold more than 31. You can patch it with bigint, but it will be broken again on 60 categories. Not good when code design limits data dictionaries.