Wednesday, May 30, 2007

Speed Up Performance And Slash Your Table Size By 90% By Using Bitwise Logic

You have all seen websites where you can pick a bunch of categories by selection a bunch of check boxes. usually what you do is store those in a lookup table and then you create another table where you store all the categories for each customer.
What if I tell you that you can store all that info in 1 row instead of 10 rows if a customer picked 10 categories.
Take a look at this




1 Classic Rock
2 Hard Rock
4 Speed/Trash Metal
You will store a value of 1 + 2 + 4 = 7(you just sum the values)

Now run this to check, the result will be 7 for a match and some other value otherwise



select 7 | 1,
7 | 2,
7 |3,
7 |4,
7 |5,
7 |6,
7 |7,
7 |8,
7 |20



What is this |(pipe symbol)?
From Books on line
The bitwise operator performs a bitwise logical OR between the two expressions, taking each corresponding bit for both expressions. The bits in the result are set to 1 if either or both bits (for the current bit being resolved) in the input expressions have a value of 1; if neither bit in the input expressions is 1, the bit in the result is set to 0.
The bitwise operator requires two expressions, and it can be used on expressions of only the integer data type category.




Here is how you would typically use this, first create this table




CREATE TABLE NumbersTable (Num int)
INSERT NumbersTable VALUES(1)
INSERT NumbersTable VALUES(2)
INSERT NumbersTable VALUES(3)
INSERT NumbersTable VALUES(4)
INSERT NumbersTable VALUES(5)
INSERT NumbersTable VALUES(6)
INSERT NumbersTable VALUES(7)
INSERT NumbersTable VALUES(8)
INSERT NumbersTable VALUES(9)
INSERT NumbersTable VALUES(10)
INSERT NumbersTable VALUES(11)
INSERT NumbersTable VALUES(12)
GO

Now run this

SELECT
Num,
CASE 7 |Num WHEN 7 THEN 'Yes' ELSE 'No' END AS COL
FROM NumbersTable

Here is the output




Num COL
---- ---
1 Yes
2 Yes
3 Yes
4 Yes
5 Yes
6 Yes
7 Yes
8 No
9 No
10 No
11 No
12 No




Okay enough theory let's start with some SQL code. First create this table which will hold all the categories




CREATE TABLE MusicChoice (ID INT PRIMARY KEY,
ChoiceDescription VARCHAR(100))




INSERT MusicChoice VALUES(1,'Classic Rock')
INSERT MusicChoice VALUES(2,'Hard Rock')
INSERT MusicChoice VALUES(3,'Speed/Trash Metal')
INSERT MusicChoice VALUES(4,'Classical')
INSERT MusicChoice VALUES(5,'Rap')
INSERT MusicChoice VALUES(6,'Blues')
INSERT MusicChoice VALUES(7,'Jazz')
INSERT MusicChoice VALUES(8,'Alternative Rock')
INSERT MusicChoice VALUES(9,'Easy Listening')
INSERT MusicChoice VALUES(10,'Progressive Rock')
INSERT MusicChoice VALUES(11,'Punk Rock')
INSERT MusicChoice VALUES(12,'Swing')
INSERT MusicChoice VALUES(13,'Techno')
INSERT MusicChoice VALUES(14,'Pop')
INSERT MusicChoice VALUES(15,'Disco')
INSERT MusicChoice VALUES(16,'Big Band')
INSERT MusicChoice VALUES(17,'Gospel')
INSERT MusicChoice VALUES(18,'Heavy Metal')
INSERT MusicChoice VALUES(19,'House')
INSERT MusicChoice VALUES(20,'Celtic')
Now create the Bitwise table

CREATE
TABLE BitwiseMusicChoice (ID INT PRIMARY KEY,
ChoiceDescription VARCHAR(100))





We will use the POWER function to create the correct values
run this

SELECT
id,POWER(2,id-1)BitID,ChoiceDescription
FROM MusicChoice




Here is the output
id BitID ChoiceDescription
1 1 Classic Rock
2 2 Hard Rock
3 4 Speed/Trash Metal
4 8 Classical
5 16 Rap
6 32 Blues
7 64 Jazz
8 128 Alternative Rock
9 256 Easy Listening
10 512 Progressive Rock
11 1024 Punk Rock
12 2048 Swing
13 4096 Techno
14 8192 Pop
15 16384 Disco
16 32768 Big Band
17 65536 Gospel
18 131072 Heavy Metal
19 262144 House
20 524288 Celtic




Now insert it into the BitwiseMusicChoice table




INSERT BitwiseMusicChoice
SELECT POWER(2,id-1)BitID,ChoiceDescription
FROM MusicChoice




Now create this customer table


CREATE
TABLE Customer (CustomerID int identity, CustomerCode uniqueidentifier not null)




Insert these 5 values first, we will use these to compare performance later




INSERT Customer VALUES('1DAB5C03-BC23-4FB5-AC3D-A46489459FE9')
INSERT Customer VALUES('F7DDCDBC-F646-493A-B872-4E2E82EA8E14')
INSERT Customer VALUES('E8A4C3D2-AEB0-4821-A49D-3BF085354448')
INSERT Customer VALUES('52581088-C427-4D2F-A782-250564D44D8C')
INSERT Customer VALUES('1B2622C4-6C17-4E74-99D6-336197FBBCFF')

Now we will insert a total of 10000 customers




SET NOCOUNT ON
BEGIN
TRAN
DECLARE
@LoopCounter INT
SET
@LoopCounter = 6
WHILE @LoopCounter <= 10000
BEGIN
INSERT
Customer VALUES(NEWID())
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
WORK
GO




Now add the primary key




ALTER TABLE Customer ADD CONSTRAINT pk_Customer PRIMARY KEY (CustomerCode)

Create another table to hold the choices

CREATE
TABLE CustomerMusicChoice (id INT identity, MusicChoiceID int, CustomerCode uniqueidentifier)




ALTER TABLE CustomerMusicChoice ADD CONSTRAINT fk_MusicChoice_ID FOREIGN KEY (MusicChoiceID) REFERENCES MusicChoice(ID)




ALTER TABLE CustomerMusicChoice ADD CONSTRAINT fk_CustomerCode FOREIGN KEY (CustomerCode)REFERENCES Customer(CustomerCode)




For each customer insert 10 random choices, this should run less than a minute



SET NOCOUNT ON
BEGIN
TRAN
DECLARE
@LoopCounter INT
DECLARE
@CustID uniqueidentifier
SET
@LoopCounter = 1
WHILE @LoopCounter <= 10000
BEGIN
SELECT
@CustID = CustomerCode
FROM Customer
WHERE CustomerID = @LoopCounter
INSERT Customer VALUES(NEWID())
INSERT CustomerMusicChoice(MusicChoiceID,CustomerCode)
SELECT TOP 10 id,@CustID
FROM MusicChoice
ORDER BY NEWID()
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
WORK
GO

Now add these indexes
CREATE INDEX ix_CustomerMusicChoice_Cust On CustomerMusicChoice(CustomerCode)



CREATE INDEX ix_CustomerMusicChoice_ID On CustomerMusicChoice(MusicChoiceID)




Create the BitwiseCustomerMusicChoice which will hold the Bitwise values




CREATE TABLE BitwiseCustomerMusicChoice (id INT identity, MusicChoiceID int, CustomerCode uniqueidentifier not null)




This will populate the BitwiseCustomerMusicChoice table




INSERT INTO BitwiseCustomerMusicChoice
SELECT SUM(POWER(2,MusicChoiceID-1)) as MusicChoiceID,CustomerCode
FROM CustomerMusicChoice
GROUP BY CustomerCode




Add the index and foreign key




ALTER TABLE BitwiseCustomerMusicChoice ADD CONSTRAINT pk_BitwiseCustomerMusicChoice PRIMARY KEY (CustomerCode)




ALTER TABLE BitwiseCustomerMusicChoice ADD CONSTRAINT fk_BitwiseCustomerCode FOREIGN KEY (CustomerCode)REFERENCES Customer(CustomerCode)

Now let's test performance. Hit CTRL + K (SQL 2000) or CTRL + M (SQL 2005)



These 2 queries will return something like this




ID ChoiceDescription Picked
8 Alternative Rock No
16 Big Band No
6 Blues No
20 Celtic No
1 Classic Rock No
4 Classical Yes
15 Disco Yes
9 Easy Listening Yes
17 Gospel No
2 Hard Rock No
18 Heavy Metal Yes
19 House Yes
7 Jazz Yes
14 Pop Yes
10 Progressive Rock Yes
11 Punk Rock No
5 Rap No
3 Speed/Trash Metal Yes
12 Swing Yes
13 Techno No


SELECT
mc.ID,ChoiceDescription,CASE WHEN CustomerCode IS NULL THEN 'No' ELSE 'Yes' END Picked
FROM CustomerMusicChoice cmc
RIGHT JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
AND CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
ORDER BY ChoiceDescription




SELECT bmc.ID,ChoiceDescription,
CASE WHEN bmc.ID |MusicChoiceID =MusicChoiceID THEN 'Yes'
ELSE 'No'
END AS Picked
FROM BitwiseCustomerMusicChoice cmc
CROSS JOIN BitwiseMusicChoice bmc
WHERE CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
ORDER BY ChoiceDescription




Look at the execution plan
67.60% against 32.40% not bad right?



Plan1




Now run this, we will add AND bmc.ID > 0 to both queries. This will change an index scan to an index seek in the bottom query




SELECT mc.ID,ChoiceDescription,CASE WHEN CustomerCode IS NULL THEN 'No' ELSE 'Yes' END Picked
FROM CustomerMusicChoice cmc
RIGHT JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
AND CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
AND mc.ID > 0
ORDER BY ChoiceDescription




SELECT bmc.ID,ChoiceDescription,
CASE WHEN bmc.ID |MusicChoiceID =MusicChoiceID THEN 'Yes'
ELSE 'No'
END AS Picked
FROM BitwiseCustomerMusicChoice cmc
CROSS JOIN BitwiseMusicChoice bmc
WHERE CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
AND bmc.ID > 0
ORDER BY ChoiceDescription



That improved the performance a little. 82.75% against 17.25%





Plan2




Now look at the tables, after running dbcc showcontig you can see that the BitwiseCustomerMusicChoice is about 1/10th the size of the CustomerMusicChoice table which is as expected.


dbcc showcontig ('BitwiseCustomerMusicChoice')
---------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'BitwiseCustomerMusicChoice' table...
Table: 'BitwiseCustomerMusicChoice' (772197801); index ID: 1, database ID: 26
TABLE level scan performed.
- Pages Scanned................................: 41
- Extents Scanned..............................: 6
- Extent Switches..............................: 5
- Avg. Pages per Extent........................: 6.8
- Scan Density [Best Count:Actual Count].......: 100.00% [6:6]
- Logical Scan Fragmentation ..................: 0.00%
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 48.0
- Avg. Page Density (full).....................: 99.41%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.




dbcc showcontig ('CustomerMusicChoice')
---------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'CustomerMusicChoice' table...
Table: 'CustomerMusicChoice' (724197630); index ID: 0, database ID: 26
TABLE level scan performed.
- Pages Scanned................................: 428
- Extents Scanned..............................: 55
- Extent Switches..............................: 54
- Avg. Pages per Extent........................: 7.8
- Scan Density [Best Count:Actual Count].......: 98.18% [54:55]
- Extent Scan Fragmentation ...................: 40.00%
- Avg. Bytes Free per Page.....................: 386.5
- Avg. Page Density (full).....................: 95.22%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


What happens if you want to get the total count of for example Classical?




SELECT COUNT(*)
FROM CustomerMusicChoice cmc
JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
WHERE mc.ChoiceDescription ='Classical'

SELECT COUNT(*)
FROM BitwiseCustomerMusicChoice cmc
JOIN BitwiseMusicChoice bmc ON bmc.ID |MusicChoiceID =MusicChoiceID
WHERE bmc.ChoiceDescription ='Classical'

Here are execution plans for SQl Server 2000 and 2005

Plan3A

Plan3B

As you can see SQL Server 2005 has a bigger difference than SQL Server 2000



Now let's look at the overal picture, on a busy system you will have the customer queries running many times an hour/day. The report queries will run maybe a couple a times a day. I think this trade off is perfectly acceptable because overall your system will perform better. Another thing to keep in mind is that instead of 10 inserts you only have to do 1, same with updates, all these little things add up to a lot eventualy.




So as you can see using bitwise logic is a great way to accomplish a couple of things
Reduce table size
Speed up backup and recovery because your table is much smaller
Improve performance



Of course you have to do some testing for yourself because it might not be appropriate for your design. If your system is more of an OLAP than OLTP type of system then don't bother implementing this since it won't help you.




Cross-posted from SQLBlog! - http://www.sqlblog.com

Blogging On sqlblog.com

Adam Machanic invited me to start blogging on sqlblog.com, I accepted and you can find posts from now on here and there. Of course you will find the goofy posts only here ;-)
Here is the link to my blog on sqlblog.com:
http://sqlblog.com/blogs/denis_gobo/default.aspx

Tuesday, May 29, 2007

Check Your SQL For SQL Injection Vulnerabilities By Using These 15 Free SQL Injection Scanners

Memorial day is over, back to work. Your first task is to check for SQL Injections
Securiry Hacks has listed 15 of them, that list is here:

SQLIer
SQLbftools
SQL Injection Brute-forcer - SQLibf
SQLBrute
BobCat
SQLMap
Absinthe
SQL Injection Pen-testing Tool
SQID
Blind SQL Injection Perl Tool
SQL Power Injection Injector
FJ-Injector Framwork
SQLNinja
Automagic SQL Injector
NGSS SQL Injector

Read the details and get the download links here: http://www.security-hacks.com/2007/05/18/top-15-free-sql-injection-scanners

If you are using a lot of dynamic SQL then don't forget to read The Curse and Blessings of Dynamic SQL written by Erland Sommarskog, SQL Server MVP.

Sunday, May 27, 2007

Top 10 .NET Framework Technologies to Learn in 2007. What do you think?

help.net list the following 10 .NET Framework Technologies to Learn in 2007. What do you think? I think that you can not known all of these as well if you picked only a handful. I marked the ones that I think are absolutely needed in red and the maybe ones in blue. As you can see I marked only 4 ([edit]that was initially, I added threading since it shouldn't take you that long to master that and I added some other ones that are essential for a developer[/edit]) and yes SQL Server is one of them. Even to master SQL Server 2005 is very difficult if not impossible. Do you know all of the following good enough that you can work with it without a problem?

SSIS
SSAS
ServiceBroker
Notification Services
Replication
SQL CLR
T-SQL

You don't and you really can't unless you don't have a life and sit in front of a PC 24/7 but even then it is doubtful
What I have noticed is that a lot of .NET developers didn't do Ajax until MS released ASP.NET Ajax, it is like they are oblivious to the fact that there are other frameworks out there like prototype for example. It is kind of sad that the technology that was first developed by Microsoft and made its way into Outlook Web Access took sooooooo long to get into Visual Studio. Same thing with Nant, Subversion etc etc. I mean how many of you are still using SourceSafe? I am not using it because Subversion is so much better and it's also free (not that that matters because we get it with our MSDN subscriptions anyway).

So open your eyes check CodePlex regularly for new things. Do you know there is a NHibernate.Spatial project? No, did you know that MbUnit 2.4 has been released?
Download it here: http://mb-unit.googlecode.com/files/MbUnit-2.4.197.exe



Here is the list (from help.net)


1) WCF (Windows Communication Foundation): While WCF is certainly less "sexy" than say WPF / Silverlight, it is going to represent the backbone of solving business problems with .NET going forward from here. Big organizations are starting to embrace it, and it provides a cohesive framework for solving business problems in a distributed, integrated way. WCF integrates Remoting, WebServices, EnterpriseServices and Transactions, WSE, MSMQ, and much more into a cohesive programming framework. If you intend to make it your business to study just one new .NET technology this year, make it WCF.

2) ADO.NET (and LINQ): ADO.NET is how you talk to a data store, and databases are such a ubiquitous part of what you will do as a developer that you have NO CHOICE but to become extremely competent in this area. The next big thing in how to talk to data is LINQ with language extensions and entity objects that "represent" mappings of data and its relationships. If you don't become at least familiar with all this stuff, somebody else is going to eat your lunch.

3) WPF (Windows Presentation Foundation): Everything you learned about Windows Forms, pages in a browser, and UI elements is going to go out the window, because Microsoft has already declared that WPF is the new way we're gonna do this stuff. Its already built in to Windows Vista, and the XPS (XML Paper Specification) is already built into the printer subsystem. Don't hang on to the old- get with the new just as fast as you can get your little tushy in gear! And besides that, WPF is just so friggin' cool, it will knock your socks off in hi-def streaming video!

4) SQL Server 2005 (and on): I understand this isn't really .NET, but then again it really is. SQL Server 2005 hosts CLR integration of managed code. That's not only revolutionary, it provides a power to the programmer that you cannot get on other platforms. You have to learn everything you can about SQL Server 2005 including Service Broker, because it will help you to be a better programmer and problem - solver.

5) ASP.NET 2.0: Even non "web programmers" need to understand how this works. The feature set has grown and matured, and you are looking at dynamic languages, LINQ, AJAX, and much more being integrated into the ASP.NET Framework (not to forget Silverlight - the sexiest technology of all of them!).

6) Security. Developers are notoriously weak on security ("Who cares about permission sets - I'm just a Code Monkey"). Unfortunately, we do not live in a perfect world and there are evil people out there who jump with glee when they can mess up your day. The more you become an expert about security, the higher your pay will be: expert security consultants make upwards of $300 / hr.

7) TDD (Test Driven Development): Unit, regression and integration testing aren't a luxury - if you want to develop robust systems then you must have a test protocol. And to do testing right, you need to study how to write tests and what tests to write. There are several excellent books on TDD and at least one that specifically focuses on .NET.

8) Networking (System.Net and related): Networking -- TCP, UDP, HTTP, FTP, and on -- are an integral part of what you need to know how to do in order to glue programs together and make them be able to talk with each other and your data. The more you know about this area, the better equipped you'll be to make the right kind of choices when you are tasked with creating business logic through code.

9) Threading: When asked to develop a multi-threaded object or to use a ThreadPool, 95 out of 100 programmers who claim to be professional .NET developers fall flat on their faces! You need to study all the threading primitives, know how they are used, be able to use the .NET or a custom ThreadPool, and manage threads in your applications.

10) Learning. That's right, I classify learning as a technology. People need to become smarter about how to learn, and especially, WHAT to learn. Just as you become a better programmer when you learn to use the Google or other search engines more effectively, you become a better programmer when you can detail for yourself what you need to learn to be better at your craft, and write down a plan for implementing that learning process.



I would like to have all of them in red but it is time to start specializing just like doctors do, you have to become a specialist. I rather have a lung doctor, an eye doctor and an orthopedic surgeon on my team than 3 physicians who know a little about everything but not a lot about one thing.

Thursday, May 24, 2007

Object Databases Suck RDMBS's Rock

Gavin King (Mr Hibernate) trashes what we call 'Object Databases'. This is excellent reading material and full of information why RDBMS's are still around after all those claims of object database vendors that their product is so incredibly more efficient

Read the article here: http://blog.hibernate.org/cgi-bin/blosxom.cgi/2007/05/23#in-defence

And remember the latest version of Nhibernate DOES support stored procedures! No more wizard generated SQL that is hard to debug because you don't know what it looks like.

Wednesday, May 23, 2007

Book Review: Pro SQL Server 2005 Database Design and Optimization

Pro SQL Server 2005 Database Design and Optimization
By Louis Davidson, Kevin Kline, Kurt Windisch
ISBN: 1590595297
672 pp.
Published: May 2006

Before I start with the book review here is a little story that took place in 1995. A former colleague wanted to open a bar in the East Village (New York City). He looked and looked and found this place somewhere on 9th street. He looked around the place and liked what he saw. Then he went to the back and saw about seven wooden beams against the wall. When he asked about them the owner said that is to keep the wall from collapsing. Anyway my friend settled on another place and opened his bar in a former parking garage, the name of the place is Liquids and it is located on east 10 Street between First Avenue and Avenue A.

Now why am I telling you this story? This story happens in the database world all the time, think of the data model as the foundation of your house if your foundation is weak you need kludges (wooden beams) to prevent your structure from collapsing. How many times do you see on newsgroups questions relating to split string because someone stored a comma delimited string in a column instead of having a lookup table? You have to start with the proper design, once you have that everything else becomes so much easier. It is easy to change a proc or some code somewhere. Once you have to redo your design then you have to also go through all the code and make changes and then also pray that you didn’t break anything. So this is what this book is all about; to teach you how to properly design a data model for speed, concurrency, security, interoperability, integrity and much more. There is no need for triggers when you can use check constraints which in turn will perform much better.

I got my copy of the book the day after our twins were born, that is a little more than 10 months ago. I have read the book many times but I did not read it in the order that it is written in. However I do recommend reading at least the first 5 chapters in the correct order, this is especially true if you are new to databases.

So what do I think of the book? I think that this book is excellent, it is written in a way that is not dry, and you will actually laugh a couple of times when reading this book. The book is written in such a way that it goes into as much detail as necessary to explain the concept and also provides links where needed for material that is not in the scope of the book. Most chapters end with a best practices and a summary section which is very useful if you want to find out what the chapter is about without reading the whole chapter first.

The book has 11 Chapter and 2 appendixes which are listed below

Chapter 1 Introduction to Database Concepts
Chapter 2 Data Modeling
Chapter 3 Conceptual Data Modeling
Chapter 4 The Normalization Process
Chapter 5 Implementing the Base Table Structures
Chapter 6 Protecting the Integrity of Your Data
Chapter 7 Securing Access to the Data
Chapter 8 Table Structures and Indexing
Chapter 9 Coding for Concurrency
Chapter 10 Code-Level Architectural Decisions
Chapter 11 Database Interoperability
APPENDIX A Codd’s 12 Rules for an RDBMS
APPENDIX B Datatype Reference
INDEX

The name of the chapter indicates what the chapter is about, I won’t go into detail by reviewing every chapter.

If you are completely new to SQL then this is the book for you because a good design is the foundation of everything. Now there are two ways to find this out the first way is the easy way. You purchase the book, read it and use the code. The second way is more painful. You have a weak database foundation, performance is bad, customers complain, your boss is breathing down your neck and the only shows you watch on TV are the late shows because you have to update statistics and reindex your database between 7PM and 10PM every single day to make it perform somewhat responsive. If you are an advanced/intermediate developer don’t think that this book is not for you. It is! I have learned a bunch of new tricks/tips which I would not have if I did not read the book. Obviously I am not the only one who praises the book; just take a look at the reviews on Amazon

If you want check out the book first here is the link to the sample chapter: http://www.apress.com/book/supplementDownload.html?bID=10005&sID=3490

Here is the link to Louis’ blog where he posts updates about the book http://drsql.spaces.live.com

I did an interview with Louis a while back, you can find that here

The nest 2 reviews will be Itzik’s and then Adam’s book. I think that Itzik’s will be next week and Adam’s the week after that.

Tuesday, May 22, 2007

Inconvenient Truth???

3 Monitors But No Email?I wonder how many trees he killed with all that paper on his desk? He better have solar panels to power those 3 monitors!

Notice how he doesn't have an inbox anywhere. Is there a 4th monitor in Al Gore's office to check email??? Surely the guy who invented the internet uses email.

Sunday, May 20, 2007

It's Time For Violence: A song about Databases vs CodeMonkey

Since it is the weekend Here are 2 song for you, they both have something to do with programming.
First song is Time For Violence
A Song about Databases
Lyrics By Roy Osherove

You can download te mp3 here: http://weblogs.asp.net/rosherove/archive/2007/05/19/it-s-time-for-violence-a-song-about-databases.aspx

Here are some of the lyrics

Hello DB My old friend
I need to work with you again
That stored procedure aint lookin' well
Who wrote that trigger should go to jail
and that index, It is slower than a snail
What the hell


The second song is CodeMonkey by Jonathan Coulton
http://www.jonathancoulton.com/2006/04/14/thing-a-week-29-code-monkey/

Code Monkey get up get coffee
Code Monkey go to job
Code Monkey have boring meeting
With boring manager Rob
Rob say Code Monkey very dilligent
But his output stink
His code not “functional” or “elegant”
What do Code Monkey think?
Code Monkey think maybe manager want to write god damned login page himself
Code Monkey not say it out loud
Code Monkey not crazy, just proud

Code Monkey like Fritos
Code Monkey like Tab and Mountain Dew
Code Monkey very simple man
With big warm fuzzy secret heart:
Code Monkey like you

Code Monkey hang around at front desk
Tell you sweater look nice
Code Monkey offer buy you soda
Bring you cup, bring you ice
You say no thank you for the soda cause
Soda make you fat
Anyway you busy with the telephone
No time for chat
Code Monkey have long walk back to cubicle he sit down pretend to work
Code Monkey not thinking so straight
Code Monkey not feeling so great

Code Monkey like Fritos
Code Monkey like Tab and Mountain Dew
Code Monkey very simple man
With big warm fuzzy secret heart:
Code Monkey like you
Code Monkey like you a lot

Code Monkey have every reason
To get out this place
Code Monkey just keep on working
See your soft pretty face
Much rather wake up, eat a coffee cake
Take bath, take nap
This job “fulfilling in creative way”
Such a load of crap
Code Monkey think someday he have everything even pretty girl like you
Code Monkey just waiting for now
Code Monkey say someday, somehow

Code Monkey like Fritos
Code Monkey like Tab and Mountain Dew
Code Monkey very simple man
With big warm fuzzy secret heart:
Code Monkey like you

And here is the Code Monkey video on YouTube

Wednesday, May 16, 2007

Interview with Adam Machanic Author Of Expert SQL Server 2005 Development

Expert SQL Server 2005 DevelopmentI am a big fan of Adam Machanic and the way he thinks about SQL server development. How may SQL developers unit test their procedures or use assertions in their code? Well I can guarantee that there are very few of those. I was very excited when Adam Machanic contacted me to give me a copy of Expert SQL Server 2005 Development for review. I did not finish reading the whole book yet but what I have read so far is very good. If you are an advanced developer and want to kick it up a notch then this is the book for you.


The question-and-answer session with Adam that follows was conducted via email.

What is the audience for this book? Is the book geared towards a beginner/intermediate level user or do you have to be an advanced user to really utilize the information in this book?
The book is certainly not for beginners; I make a lot of assumptions about the level of knowledge possessed by the reader. The reader should understand the basics of writing T-SQL, creating indexes, reading query plans, etc. It's difficult to differentiate between intermediate and advanced users, as everyone has their own opinion of what those levels actually mean! I've covered several not-so-common topics, and my hope is that that almost any SQL Server practitioner who has advanced beyond the basics can learn something from the book.


What are the most important things a person can do to master Transact-SQL?
Practice, practice, and more practice! The more complex queries you write, and the greater the variety of problems you solve in T-SQL, the easier it gets. Just like other programming languages, T-SQL programmers use and reuse various patterns as they work on different projects and problems. Gaining skill in T-SQL is really nothing more than a process of learning when and how to properly apply the patterns to any given situation.


Adam, the first time I came across your name was in the Microsoft public SQL Server programming group. Can you explain to us why you participate in newsgroups and forums?
When I first started working with SQL Server and other Microsoft technologies I discovered Microsoft's news server and was shocked to discover the overall quality and high signal-to-noise ratio compared to many other (Usenet) newsgroups I'd previously frequented. Questions I asked were actually answered--by people who knew what they were talking about, no less! After spending quite some time asking my own questions I realized I could answer a few here and there too, and give something back to the community that had helped me so much. Over time I became hooked. What a great way to kill free time (especially while waiting for software to compile or long queries to run)--not only does it feel good to help others solve their problems, but it also helps to strengthen my skills as a developer. As I mentioned in the previous question, the most important key to software development greatness is practice. By answering newsgroup and forum posts, you can get exposure to all sorts of business problems and scenarios that you would otherwise never see. There is no better learning experience that I have found.


Why is it important to think in sets instead of procedurally?
It's often said that thinking in sets takes advantage of the declarative nature of SQL; and while this is certainly true I think it's important to drill down to why that's such a big deal. SQL query performance, generally speaking, comes down to I/O performance. Too many I/Os, or an I/O-bound system, and your query won't return fast enough--simple as that. When we write set-based queries, we allow the query engine to optimize in order to minimize the number of index requests--and, therefore, I/O operations--necessary to satisfy the query. Working procedurally, with loops, means that the opposite happens--the maximum number of index operations will happen on every invocation, and therefore the query will perform in constant time—at the worst possible level.


Why do you feel it is important to properly unit test procedures?
I cover unit testing in some detail in Chapter 2 of the book, including when and why you want to do it. But what it boils down to is that creating a solid unit test infrastructure helps to guarantee that as you change your code over time, you won't break anything--and if you do, you'll find out about it very quickly. Studies have shown that by far the most expensive part of the software development process is maintenance. This is due to the fact that as we change our software, we break other pieces (dependencies)--and oftentimes, those dependency breaks silently cause subtle issues, that aren't noticed until long after the change was actually made. This can make it extremely difficult to track down and fix the root causes of problems. Comprehensive unit testing makes it possible to eliminate a lot of that delay, by alerting the developer right away if something doesn't look good. This, in turn, results in better quality software, happier customers, more business, and as a result, bigger bonus checks. A win-win situation, if you ask me.


Which chapter was the hardest to write and can you explain why?
Chapter 10, "Working with Temporal Data," took me four months to write. It's my favorite chapter in the book, and one of my favorite topics to think about, but it covers a huge variety of scenarios. As I mention in the chapter, temporal data finds its way into every single database, so it's a large and far-reaching topic. Attempting to cover it in a fairly complete manner turned out to be quite a big task!


Why do you write technical books?
Mainly for the groupies. But aside from that obvious perk, I do it simply because I love sharing technology and techniques with others. Writing a book is similar to participating in a newsgroup or technical forum, but on a much deeper level. Whereas I might answer a question on a forum in 400 words, or write a 3,000-word article, in a book I can cover the same topic in 15,000 or more words--really guide the reader through all of the pitfalls and ins and outs of a given area, and do things that would be impossible in another media (except, perhaps, in a long series of blog posts or articles). Plus, let's be honest--it's just a real thrill seeing my name on the cover of a book!


Who are your favorite authors?
For technical titles, my favorites are Steve McConnell ("Code Complete," "Rapid Development"), Joe Celko ("SQL For Smarties" and others), and Andrew Troelsen ("Pro C# and the .NET Platform"). These authors manage to convey heavy technical topics in very readable formats--which is something I aspire to doing in my own writing.

Unfortunately, I haven't been reading much fiction recently. The last really great novel I read was "Cloud Atlas" by David Mitchell. I'm also a longtime fan of science fiction writers such as William Gibson ("Neuromancer") and Neil Stephenson ("Snow Crash," "Cryptonomicon").


I noticed you have a chapter that deals with application concurrency, do you feel that developers don't have a multi-user/multi-app mindset when first developing the data model/procedures?
With the recent trend towards multiprocessor and multicore machines, threading and synchronization have been hot topics. I've read lots of interviews and articles about how many developers don't understand multithreading well enough. Yet as database developers, I feel like we've been there for years--the database is almost always a shared resource, and managing synchronization is no less important there than in the application tier. A lot of the time we tend to think about concurrency merely in the context of isolation levels and the like, but there is a lot more to it than that--an application will often push much of its synchronization logic all the way into the data tier (whether on purpose or not), and how to properly handle things is really more of a business question than a technical one. When writing that chapter, I tried to frame the issues in terms of business requirements, and then show how to use SQL Server to satisfy them. I think that developers are sometimes prone to getting too excited about technology and trying to find a business problem to which they can apply some new technique--the whole "hammer looking for a nail" mentality--rather than the correct route of determining the business problem and then using technology to solve it.


What SQL Server books are on your bookshelf?
I've now gotten rid of most of my SQL Server 2000 books, except for Ken Henderson's Guru's Guide" series and Kalen Delaney's "Inside SQL Server 2000". My bookshelves (note the plural!) are filled with SQL Server 2005 books--for the first year or so after its release I managed to get almost every new book that was released. I eventually slowed down a bit, but it's hard--there is a lot of great material out there! I'm a really big fan of the "Rational Guides" -- short, concise, yet in-depth guides to just one topic, generally very well-written and easy to digest.


Why did you decide to include a chapter on spatial data?
Through nothing more than luck of the draw I ended up on a couple of projects involving heavy spatial queries, and found that it was an area I really enjoyed. I love to solve difficult problems, and since comparatively very few people are doing spatial in SQL Server, I found that there was a lot of room to innovate--so I ended up having a great time with these projects and totally got hooked. I feel that spatial data is an area we're going to see a lot more of in the future. Spatially-enabled devices are becoming more and more prevalent, and of course all of those devices are going to need data. Unfortunately, due to time constraints I was not able to write that chapter, so my friend Hugo Kornelis tackled it. Hugo did an absolutely fantastic job of covering the topic--I think it's a great chapter, and will help readers get right up to speed.


Are you still making your own beer?
I live in a fairly small apartment at the moment and am somewhat cramped for space as well as time, so I haven't brewed a batch beer for the last couple of years. However, I have been experimenting with ciders quite a bit, and have made a variety of batches using all sorts of adjuncts including honey, brown sugar, and malt extract. Cider is very forgiving and needs to sit around for a long time before it's drinkable, so it's easy to make in a small space with limited time. I've also made a few batches of ginger beer, using a recipe that we actually designed in an online SQL Server forum!
(http://www.simple-talk.com/community/forums/thread/1232.aspx )



Table Of Contents
CHAPTER 1 Software Development Methodologies for the Database World
CHAPTER 2 Testing Database Routines
CHAPTER 3 Errors and Exceptions
CHAPTER 4 Privilege and Authorization
CHAPTER 5 Encryption
CHAPTER 6 SQLCLR: Architecture and Design Considerations
CHAPTER 7 Dynamic T-SQL
CHAPTER 8 Designing Systems for Application Concurrency
CHAPTER 9 Working with Spatial Data
CHAPTER 10 Working with Temporal Data
CHAPTER 11 Trees, Hierarchies, and Graphs
INDEX

Sample: Chapter: Ch. 01 - Software Development Methodologies for the Database World
Amazon Link: Expert SQL Server 2005 Development

Tuesday, May 15, 2007

flickrvision

David Troy the person who brought you twittervision has done it again this time with flickr. Check out flickrvision


Who needs animated desktops anymore? Be warned it can get very addictive.

Monday, May 14, 2007

Google: A Girl's Best Friend

Type these searches into a Google search box

she created
she invented
she discovered
she debugged

Look at what Google suggests.

Make Your Case Sensitive Searches 1000 Times Faster

I had an case sensitive update query that ran in about 9 minutes. Since it was a case sensitive query it did an index scan not an index seek.
Once I modified my WHERE clause the update took a little less than 3 seconds

Let's get started and see what I did

First create this table

CREATE TABLE #CaseSensitiveSearchTemp (Val CHAR(1))

INSERT #CaseSensitiveSearchTemp VALUES('A')
INSERT #CaseSensitiveSearchTemp VALUES('B')
INSERT #CaseSensitiveSearchTemp VALUES('C')
INSERT #CaseSensitiveSearchTemp VALUES('D')
INSERT #CaseSensitiveSearchTemp VALUES('E')
INSERT #CaseSensitiveSearchTemp VALUES('F')
INSERT #CaseSensitiveSearchTemp VALUES('G')
INSERT #CaseSensitiveSearchTemp VALUES('H')

Now we will insert some lowercase characters

INSERT #CaseSensitiveSearchTemp
SELECT LOWER(Val) FROM #CaseSensitiveSearchTemp

Now we will create our real table which will have 65536 rows

CREATE TABLE CaseSensitiveSearch (Val VARCHAR(50))

We will do a couple of cross joins to generate the data for our queries

INSERT CaseSensitiveSearch
SELECT t1.val + t2.val + t3.val + t4.val
FROM #CaseSensitiveSearchTemp t1
CROSS JOIN #CaseSensitiveSearchTemp t2
CROSS JOIN #CaseSensitiveSearchTemp t3
CROSS JOIN #CaseSensitiveSearchTemp t4


This should give you 65536 rows

SELECT * FROM CaseSensitiveSearch


Create an index on the table

CREATE INDEX IX_SearchVal ON CaseSensitiveSearch(Val)


This is how you do a case sensitive search

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS



Now hit CRLK + K (SQL Server 2000) or CRLK + M(SQL Server 2005)

run these 2 queries in one batch by highlighting them both and hitting F5

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD'

Look at the execution plan, I get 98.71% for the first query and 1.29% for the second query. Just by adding the AND condition SQL server is able to do an index seek and run the query many times faster




Now try it with a lowercase a

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'aBCD' COLLATE SQL_Latin1_General_CP1_CS_AS

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'aBCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'aBCD'


You see it all works without a problem, the correct result is returned

Sunday, May 13, 2007

Great Programming Quote

"pasting code from the internet into production code is like chewing gum found in the street."

None of use pasted straight into production code from a newsgroup/forum right? <g>

Thursday, May 10, 2007

Why Does OBJECTPROPERTY Have A TableIsFake Property?

You can run the following function (OBJECTPROPERTY(object_id, N'TableIsFake') ) on an object and it will return 1 if the table is fake and 0 otherwise.
What does BOL say? The table is not real. It is materialized internally on demand by SQL Server. So does this mean it is a table valued function?
Well one way to find out.

Run this

USE AdventureWorks;
GO
SELECT name, object_id, type_desc,OBJECT_DEFINITION(object_id) as object_definition
FROM sys.objects
WHERE OBJECTPROPERTY(object_id, N'TableIsFake') = 1
ORDER BY type_desc, name;
GO


And we get back a table valued function in the resultset

name: ufnGetContactInformation
object_id: 439672614
type_desc: SQL_TABLE_VALUED_FUNCTION
object_definition; CREATE FUNCTION .... (I truncated the rest)


Now create another function

CREATE FUNCTION [dbo].[ufnGetSomeTable]()
RETURNS @SomeTable TABLE (
[ContactID] int PRIMARY KEY NOT NULL)
BEGIN
INSERT
@SomeTable VALUES(1)
INSERT @SomeTable VALUES(2)

RETURN
END

GO

Run this to make sure it works
SELECT * FROM ufnGetSomeTable()

run the same query again

USE AdventureWorks;
GO
SELECT name, object_id, type_desc,OBJECT_DEFINITION(object_id) as object_definition
FROM sys.objects
WHERE OBJECTPROPERTY(object_id, N'TableIsFake') = 1
ORDER BY type_desc, name;
GO


And yes we get 2 functions back ;-)

Now we will create some temp tables to see if those are fake

CREATE TABLE #testFakeTable1234 (id int)
CREATE TABLE ##testFakeTable1234 (id int)

Now run this and you will see that those are real and not fake

USE tempdb;
GO
SELECT name, object_id, type_desc,OBJECTPROPERTY(object_id, N'TableIsFake') as IsFake
FROM sys.objects
WHERE name LIKE '#testFakeTable1234%'
OR name LIKE '##testFakeTable1234%'
ORDER BY type_desc, name;
GO

DROP TABLE #testFakeTable1234,##testFakeTable1234


Now create this function

USE AdventureWorks;
GO

CREATE FUNCTION dbo.fn_nums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1) ,
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L0)
SELECT n FROM Nums WHERE n <= @n;

Test it out to make sure it works
SELECT * FROM dbo.fn_nums(2)


Run the following 2 queries

USE AdventureWorks;
GO

SELECT name, object_id, type_desc,OBJECTPROPERTY(object_id, N'TableIsFake') IsFake, OBJECTPROPERTY(object_id, N'IsTableFunction') IsTableFunction
FROM sys.objects
WHERE OBJECTPROPERTY(object_id, N'TableIsFake') <> OBJECTPROPERTY(object_id, N'IsTableFunction')
ORDER BY type_desc, name;


USE AdventureWorks;
GO

SELECT name, object_id, type_desc,OBJECTPROPERTY(object_id, N'TableIsFake') IsFake,
OBJECTPROPERTY(object_id, N'IsInlineFunction') IsInlineFunction
FROM sys.objects
WHERE OBJECTPROPERTY(object_id, N'TableIsFake') <> OBJECTPROPERTY(object_id, N'IsInlineFunction')
ORDER BY type_desc, name;


So a SQL_table valued function is a fake table but a SQL inline table valued function is not a fake table????
So there you have it, confused? Good!

Wednesday, May 09, 2007

Some Katmai (SQL Server 2008?) News

Microsoft issued a Katmai press release, you can read the full press release here: http://www.microsoft.com/presspass/press/2007/may07/05-09KatmaiPR.mspx?rss_fdn=Press%20Releases

Here are the important parts:

Scheduled to be available in 2008, SQL Server “Katmai” will deliver on Microsoft’s overall data platform vision to meet the needs of the coming data explosion and the next generation of data-driven applications.

Data Platform to Power Business Insights Regardless of Size or Budget

SQL Server “Katmai” is designed to do the following:

• Provide a scalable and reliable platform with advanced security technology for even the most demanding applications

• Reduce the time and cost of managing data infrastructure with innovative policy-based management


Redefining Pervasive Insight

SQL Server “Katmai” brings powerful BI capabilities and valuable data even closer to every user:

• Empowers users to easily consume information due to increased integration with front-end tools in the 2007 Microsoft Office system, including Office Excel® 2007, Excel Services, Office SharePoint® Server and Office PerformancePoint Server 2007

• Provides reports of any size or complexity internally within organizations and externally to partners and suppliers

• Aligns corporate decision-making by collaborating on key analysis and reports within Microsoft Office SharePoint Server

• Integrates all relevant data within the enterprise into a scalable and comprehensive data warehouse platform


Dynamic Development for Data Management Solutions

SQL Server “Katmai” offers these features for developers:

• Provides an integrated development environment with Microsoft Visual Studio® and .NET Framework that will accelerate development of new applications with a higher level of data abstraction

• Enables developers to synchronize data from virtually any device to the central data store



Beyond Relational Data

SQL Server “Katmai” enables users to do the following:

• Manage any type of data including relational data, documents, geographic information and XML

• Work with a consistent set of services and tools


Availability

SQL Server “Katmai” is scheduled to be delivered in 2008. More information is available at http://www.microsoft.com/sql/prodinfo/futureversion/default.mspx


Download the Katmai data sheet here: http://download.microsoft.com/download/B/F/2/BF24C54E-5635-4C79-AFB4-0C3F840E79F4/Katmai_datasheet_Final.pdf

Monday, May 07, 2007

Three Ways To Return All Rows That Contain Uppercase Characters Only

How do you select all the rows that contain uppercase characters only? There sre three ways to do this
1 Compare with BINARY_CHECKSUM
2 Use COLLATE
3 Cast to varbinary

Let's first create the table and also some test data

CREATE TABLE #tmp ( x VARCHAR(10) NOT NULL )


INSERT INTO #tmp
SELECT 'Word' UNION ALL
SELECT 'WORD' UNION ALL
SELECT 'ABC' UNION ALL
SELECT 'AbC' UNION ALL
SELECT 'ZxZ' UNION ALL
SELECT 'ZZZ' UNION ALL
SELECT 'word'


if we want only the uppercase columns then this is supposed to be our output

WORD
ABC
ZZZ

Let's get started, first up is BINARY_CHECKSUM

SELECT x
FROM #TMP
WHERE BINARY_CHECKSUM(x) = BINARY_CHECKSUM(UPPER(x))


Second is COLLATE

SELECT x
FROM #TMP
WHERE x = UPPER(x) COLLATE SQL_Latin1_General_CP1_CS_AS

Third is Cast to varbinary

SELECT x
FROM #TMP
WHERE CAST(x AS VARBINARY(10)) = CAST(UPPER(x) AS VARBINARY(10))


Of course if you database is already case sensitive you can just do the following

SELECT x
FROM #TMP
WHERE UPPER(x) = x

That will work, how do you find out what collation was used when your database was created? You can use DATABASEPROPERTYEX for that. I use the model DB here because when you create a new DB by default it inherits all the properties from the model DB.
When I run this
SELECT DATABASEPROPERTYEX( 'model' , 'collation' )

I get this as output: SQL_Latin1_General_CP1_CI_AS


What does all that junk mean? Well let's run the following function (yes those are 2 colons ::)
SELECT *
FROM ::fn_helpcollations ()
WHERE NAME ='SQL_Latin1_General_CP1_CI_AS'

The description column contains this info


Latin1-General, case-insensitive, accent-sensitive,
kanatype-insensitive, width-insensitive for Unicode Data,
SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

You can read some more info about Selecting a SQL Collation here: http://msdn2.microsoft.com/en-us/library/aa176552(SQL.80).aspx

Public CTP of Visual Studio Team Edition for Database Professionals first Service Release Available For Download

What is new?

3+4 part name resolution through a new feature called "database references"
File groups and files are now part of the project settings and can be fully parameterize then to fit your deployment needs (this replaces the need for the storage deployment script)
SQLCMD variable support; this is enabled through a new project property page for creating and editing variables that you can then use inside your scripts.
Variables are stored inside the project file and can be mapped to MSBuild properties.

Read more details and get the download from the Data Dude blog