Wednesday, May 30, 2007

Speed Up Performance And Slash Your Table Size By 90% By Using Bitwise Logic

You have all seen websites where you can pick a bunch of categories by selection a bunch of check boxes. usually what you do is store those in a lookup table and then you create another table where you store all the categories for each customer.
What if I tell you that you can store all that info in 1 row instead of 10 rows if a customer picked 10 categories.
Take a look at this




1 Classic Rock
2 Hard Rock
4 Speed/Trash Metal
You will store a value of 1 + 2 + 4 = 7(you just sum the values)

Now run this to check, the result will be 7 for a match and some other value otherwise



select 7 | 1,
7 | 2,
7 |3,
7 |4,
7 |5,
7 |6,
7 |7,
7 |8,
7 |20



What is this |(pipe symbol)?
From Books on line
The bitwise operator performs a bitwise logical OR between the two expressions, taking each corresponding bit for both expressions. The bits in the result are set to 1 if either or both bits (for the current bit being resolved) in the input expressions have a value of 1; if neither bit in the input expressions is 1, the bit in the result is set to 0.
The bitwise operator requires two expressions, and it can be used on expressions of only the integer data type category.




Here is how you would typically use this, first create this table




CREATE TABLE NumbersTable (Num int)
INSERT NumbersTable VALUES(1)
INSERT NumbersTable VALUES(2)
INSERT NumbersTable VALUES(3)
INSERT NumbersTable VALUES(4)
INSERT NumbersTable VALUES(5)
INSERT NumbersTable VALUES(6)
INSERT NumbersTable VALUES(7)
INSERT NumbersTable VALUES(8)
INSERT NumbersTable VALUES(9)
INSERT NumbersTable VALUES(10)
INSERT NumbersTable VALUES(11)
INSERT NumbersTable VALUES(12)
GO

Now run this

SELECT
Num,
CASE 7 |Num WHEN 7 THEN 'Yes' ELSE 'No' END AS COL
FROM NumbersTable

Here is the output




Num COL
---- ---
1 Yes
2 Yes
3 Yes
4 Yes
5 Yes
6 Yes
7 Yes
8 No
9 No
10 No
11 No
12 No




Okay enough theory let's start with some SQL code. First create this table which will hold all the categories




CREATE TABLE MusicChoice (ID INT PRIMARY KEY,
ChoiceDescription VARCHAR(100))




INSERT MusicChoice VALUES(1,'Classic Rock')
INSERT MusicChoice VALUES(2,'Hard Rock')
INSERT MusicChoice VALUES(3,'Speed/Trash Metal')
INSERT MusicChoice VALUES(4,'Classical')
INSERT MusicChoice VALUES(5,'Rap')
INSERT MusicChoice VALUES(6,'Blues')
INSERT MusicChoice VALUES(7,'Jazz')
INSERT MusicChoice VALUES(8,'Alternative Rock')
INSERT MusicChoice VALUES(9,'Easy Listening')
INSERT MusicChoice VALUES(10,'Progressive Rock')
INSERT MusicChoice VALUES(11,'Punk Rock')
INSERT MusicChoice VALUES(12,'Swing')
INSERT MusicChoice VALUES(13,'Techno')
INSERT MusicChoice VALUES(14,'Pop')
INSERT MusicChoice VALUES(15,'Disco')
INSERT MusicChoice VALUES(16,'Big Band')
INSERT MusicChoice VALUES(17,'Gospel')
INSERT MusicChoice VALUES(18,'Heavy Metal')
INSERT MusicChoice VALUES(19,'House')
INSERT MusicChoice VALUES(20,'Celtic')
Now create the Bitwise table

CREATE
TABLE BitwiseMusicChoice (ID INT PRIMARY KEY,
ChoiceDescription VARCHAR(100))





We will use the POWER function to create the correct values
run this

SELECT
id,POWER(2,id-1)BitID,ChoiceDescription
FROM MusicChoice




Here is the output
id BitID ChoiceDescription
1 1 Classic Rock
2 2 Hard Rock
3 4 Speed/Trash Metal
4 8 Classical
5 16 Rap
6 32 Blues
7 64 Jazz
8 128 Alternative Rock
9 256 Easy Listening
10 512 Progressive Rock
11 1024 Punk Rock
12 2048 Swing
13 4096 Techno
14 8192 Pop
15 16384 Disco
16 32768 Big Band
17 65536 Gospel
18 131072 Heavy Metal
19 262144 House
20 524288 Celtic




Now insert it into the BitwiseMusicChoice table




INSERT BitwiseMusicChoice
SELECT POWER(2,id-1)BitID,ChoiceDescription
FROM MusicChoice




Now create this customer table


CREATE
TABLE Customer (CustomerID int identity, CustomerCode uniqueidentifier not null)




Insert these 5 values first, we will use these to compare performance later




INSERT Customer VALUES('1DAB5C03-BC23-4FB5-AC3D-A46489459FE9')
INSERT Customer VALUES('F7DDCDBC-F646-493A-B872-4E2E82EA8E14')
INSERT Customer VALUES('E8A4C3D2-AEB0-4821-A49D-3BF085354448')
INSERT Customer VALUES('52581088-C427-4D2F-A782-250564D44D8C')
INSERT Customer VALUES('1B2622C4-6C17-4E74-99D6-336197FBBCFF')

Now we will insert a total of 10000 customers




SET NOCOUNT ON
BEGIN
TRAN
DECLARE
@LoopCounter INT
SET
@LoopCounter = 6
WHILE @LoopCounter <= 10000
BEGIN
INSERT
Customer VALUES(NEWID())
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
WORK
GO




Now add the primary key




ALTER TABLE Customer ADD CONSTRAINT pk_Customer PRIMARY KEY (CustomerCode)

Create another table to hold the choices

CREATE
TABLE CustomerMusicChoice (id INT identity, MusicChoiceID int, CustomerCode uniqueidentifier)




ALTER TABLE CustomerMusicChoice ADD CONSTRAINT fk_MusicChoice_ID FOREIGN KEY (MusicChoiceID) REFERENCES MusicChoice(ID)




ALTER TABLE CustomerMusicChoice ADD CONSTRAINT fk_CustomerCode FOREIGN KEY (CustomerCode)REFERENCES Customer(CustomerCode)




For each customer insert 10 random choices, this should run less than a minute



SET NOCOUNT ON
BEGIN
TRAN
DECLARE
@LoopCounter INT
DECLARE
@CustID uniqueidentifier
SET
@LoopCounter = 1
WHILE @LoopCounter <= 10000
BEGIN
SELECT
@CustID = CustomerCode
FROM Customer
WHERE CustomerID = @LoopCounter
INSERT Customer VALUES(NEWID())
INSERT CustomerMusicChoice(MusicChoiceID,CustomerCode)
SELECT TOP 10 id,@CustID
FROM MusicChoice
ORDER BY NEWID()
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
WORK
GO

Now add these indexes
CREATE INDEX ix_CustomerMusicChoice_Cust On CustomerMusicChoice(CustomerCode)



CREATE INDEX ix_CustomerMusicChoice_ID On CustomerMusicChoice(MusicChoiceID)




Create the BitwiseCustomerMusicChoice which will hold the Bitwise values




CREATE TABLE BitwiseCustomerMusicChoice (id INT identity, MusicChoiceID int, CustomerCode uniqueidentifier not null)




This will populate the BitwiseCustomerMusicChoice table




INSERT INTO BitwiseCustomerMusicChoice
SELECT SUM(POWER(2,MusicChoiceID-1)) as MusicChoiceID,CustomerCode
FROM CustomerMusicChoice
GROUP BY CustomerCode




Add the index and foreign key




ALTER TABLE BitwiseCustomerMusicChoice ADD CONSTRAINT pk_BitwiseCustomerMusicChoice PRIMARY KEY (CustomerCode)




ALTER TABLE BitwiseCustomerMusicChoice ADD CONSTRAINT fk_BitwiseCustomerCode FOREIGN KEY (CustomerCode)REFERENCES Customer(CustomerCode)

Now let's test performance. Hit CTRL + K (SQL 2000) or CTRL + M (SQL 2005)



These 2 queries will return something like this




ID ChoiceDescription Picked
8 Alternative Rock No
16 Big Band No
6 Blues No
20 Celtic No
1 Classic Rock No
4 Classical Yes
15 Disco Yes
9 Easy Listening Yes
17 Gospel No
2 Hard Rock No
18 Heavy Metal Yes
19 House Yes
7 Jazz Yes
14 Pop Yes
10 Progressive Rock Yes
11 Punk Rock No
5 Rap No
3 Speed/Trash Metal Yes
12 Swing Yes
13 Techno No


SELECT
mc.ID,ChoiceDescription,CASE WHEN CustomerCode IS NULL THEN 'No' ELSE 'Yes' END Picked
FROM CustomerMusicChoice cmc
RIGHT JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
AND CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
ORDER BY ChoiceDescription




SELECT bmc.ID,ChoiceDescription,
CASE WHEN bmc.ID |MusicChoiceID =MusicChoiceID THEN 'Yes'
ELSE 'No'
END AS Picked
FROM BitwiseCustomerMusicChoice cmc
CROSS JOIN BitwiseMusicChoice bmc
WHERE CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
ORDER BY ChoiceDescription




Look at the execution plan
67.60% against 32.40% not bad right?



Plan1




Now run this, we will add AND bmc.ID > 0 to both queries. This will change an index scan to an index seek in the bottom query




SELECT mc.ID,ChoiceDescription,CASE WHEN CustomerCode IS NULL THEN 'No' ELSE 'Yes' END Picked
FROM CustomerMusicChoice cmc
RIGHT JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
AND CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
AND mc.ID > 0
ORDER BY ChoiceDescription




SELECT bmc.ID,ChoiceDescription,
CASE WHEN bmc.ID |MusicChoiceID =MusicChoiceID THEN 'Yes'
ELSE 'No'
END AS Picked
FROM BitwiseCustomerMusicChoice cmc
CROSS JOIN BitwiseMusicChoice bmc
WHERE CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
AND bmc.ID > 0
ORDER BY ChoiceDescription



That improved the performance a little. 82.75% against 17.25%





Plan2




Now look at the tables, after running dbcc showcontig you can see that the BitwiseCustomerMusicChoice is about 1/10th the size of the CustomerMusicChoice table which is as expected.


dbcc showcontig ('BitwiseCustomerMusicChoice')
---------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'BitwiseCustomerMusicChoice' table...
Table: 'BitwiseCustomerMusicChoice' (772197801); index ID: 1, database ID: 26
TABLE level scan performed.
- Pages Scanned................................: 41
- Extents Scanned..............................: 6
- Extent Switches..............................: 5
- Avg. Pages per Extent........................: 6.8
- Scan Density [Best Count:Actual Count].......: 100.00% [6:6]
- Logical Scan Fragmentation ..................: 0.00%
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 48.0
- Avg. Page Density (full).....................: 99.41%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.




dbcc showcontig ('CustomerMusicChoice')
---------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'CustomerMusicChoice' table...
Table: 'CustomerMusicChoice' (724197630); index ID: 0, database ID: 26
TABLE level scan performed.
- Pages Scanned................................: 428
- Extents Scanned..............................: 55
- Extent Switches..............................: 54
- Avg. Pages per Extent........................: 7.8
- Scan Density [Best Count:Actual Count].......: 98.18% [54:55]
- Extent Scan Fragmentation ...................: 40.00%
- Avg. Bytes Free per Page.....................: 386.5
- Avg. Page Density (full).....................: 95.22%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


What happens if you want to get the total count of for example Classical?




SELECT COUNT(*)
FROM CustomerMusicChoice cmc
JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
WHERE mc.ChoiceDescription ='Classical'

SELECT COUNT(*)
FROM BitwiseCustomerMusicChoice cmc
JOIN BitwiseMusicChoice bmc ON bmc.ID |MusicChoiceID =MusicChoiceID
WHERE bmc.ChoiceDescription ='Classical'

Here are execution plans for SQl Server 2000 and 2005

Plan3A

Plan3B

As you can see SQL Server 2005 has a bigger difference than SQL Server 2000



Now let's look at the overal picture, on a busy system you will have the customer queries running many times an hour/day. The report queries will run maybe a couple a times a day. I think this trade off is perfectly acceptable because overall your system will perform better. Another thing to keep in mind is that instead of 10 inserts you only have to do 1, same with updates, all these little things add up to a lot eventualy.




So as you can see using bitwise logic is a great way to accomplish a couple of things
Reduce table size
Speed up backup and recovery because your table is much smaller
Improve performance



Of course you have to do some testing for yourself because it might not be appropriate for your design. If your system is more of an OLAP than OLTP type of system then don't bother implementing this since it won't help you.




Cross-posted from SQLBlog! - http://www.sqlblog.com

1 comment:

Alexander Tarasul said...

This is great tip, but one sided - the opinion of performance DBA, but not an opinion of data architect.
Yes the solution will work fine for the data you described. It will stop working when number of categories will reach 32 - the int field just doesn't hold more than 31. You can patch it with bigint, but it will be broken again on 60 categories. Not good when code design limits data dictionaries.