Thursday, June 28, 2007

Guess What I Will Be Doing Tomorrow (June 29th 2007) At 6PM

Did you think I would be waiting in line like a dummy to purchase the iPhone? Wrong my friend, I will pick up a birthday cake for my wife. As a matter of fact my home is Apple free. That is right not even Quicktime is installed ;-)

Anyway why would I buy the iPhone? I checked my bill last month and I have used a whopping 6 minutes. I don’t really use my phone except for emergencies. It is good that my wife and I share the minutes, if not she would go over every month.

So are you buying an iPhone? And if yes then please tell me why?

The next post will be technical again; it will be about Scrum and Poker.

Friday, June 22, 2007

Summer SQL Teaser #4 Nulls and Counts

First create this table

CREATE TABLE Teaser (ID int)
INSERT Teaser VALUES(1)
INSERT Teaser VALUES(2)
INSERT Teaser VALUES(1)
INSERT Teaser VALUES(2)
INSERT Teaser VALUES(NULL)


Without running this try to figure out what the result will be

SELECT COUNT(*),
COUNT(ID),
COUNT(DISTINCT ID)
FROM Teaser


For some more NULL fun you can read NULL Trouble In SQL Server Land

Thursday, June 21, 2007

Good Distributed Partitioned Views / Federated Databases Article

The Microsoft SQL Server Development Customer Advisory Team has a nice blog post about Distributed Partitioned Views / Federated Databases

They cover the following definitions

Definition 1: Local Partitioned View – A single table is horizontally split into multiple tables, usually all have the same structure.

Definition 2: Cross Database Partitioned View – tables are split among different databases on the same server instance

Definition 3: Distributed (across server or instance) Partitioned View. Tables participating in the view reside in different databases which reside on different servers or different instances.

Make sure you read the list of 13 items under Lessons Learned on Distributed Partitioned Views: (multiple servers involved)


Link to the article: http://blogs.msdn.com/sqlcat/archive/2007/06/20/distributed-partitioned-views-federated-databases-lessons-learned.aspx

Monday, June 18, 2007

Which SQL Server 2005 Analysis Services Book Should I Get?

Who can recommend a good SQL Server 2005 Analysis Services book?

I know SQL Server 2000 Analysis Services but did not work with SQL Server 2005 Analysis Services yet. As a matter of fact I haven’t touched Analysis Services in 2 years. I don’t need a book which explains what a start or join schema is, I know what a slowly changing dimension is, I also know the difference between a fact table, a dimension table and MOLAP/ROLAP/HOLAP.

The 2 books that I used previously are the WROX book (Professional SQL Server 2000 Data Warehousing with Analysis Services) and the MS Press Step by Step book. I remember liking both of them, are their successors as good?
The problem with the reviews on Amazon is that it doesn’t match my expectations; the WROX 2000 book only got 2.5 stars which I think is way too low. While the book is not perfect it deserves more than 2.5 stars.


Here are the 5 books I am considering

Microsoft SQL Server(TM) 2005 Analysis Services Step by Step

Delivering Business Intelligence with Microsoft SQL Server 2005

Professional SQL Server Analysis Services 2005 with MDX

Microsoft SQL Server 2005 Analysis Services

The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset



BTW I don’t need the book right now, I won’t actually start working with this until next year. If you know of a book that is coming out between now and January please let me know also.

Thursday, June 14, 2007

Book Review: Expert SQL Server 2005 Development By Adam Machanic

Expert SQL Server 2005 DevelopmentIf you are an advanced or intermediate SQL Server developer then this is the book for you. Adam understands real world scenarios and understands that databases are part of a bigger group in the business world. The database is usually the most important asset in an organization. All your data is in the database, you need to secure it, this is where encryption, privilege and authorization comes in. The ratio of web servers to database servers is usually many to one, it is easy to scale out with web servers however with database servers this is not so easy. This is a reason why your code needs to be optimized and designed for application concurrency.

I recommend this book to any intermediate or advanced SQL Server developer. This book is not a book that is like the other book you have but 2 chapters are different. NO, this book contains a lot of good info which is not available in other books. I learned a lot from reading this book and you will too. Here is the breakdown of what is covered in the chapters.

Chapter 1 Software Development Methodologies for the Database World
Adam explains what Cohesion, Coupling and Encapsulation is, where the business logic should live and the balance between maintainability, performance, security and more.

Chapter 2 Testing Database Routines
This chapter is worth the price of the book by itself. You will learn how to unit test your procedures, evaluate performance counters and this chapter introduces the SQLQueryStress Performance Tool (see picture below) which will be used in other chapters. This is a very useful tool if you have to tune a query. How many times do you set statistics time and statistics IO on and off to see the reads and CPU time? This tool does it all for you, paste in your query or proc call, specify how many times you want to run it that is it. This tool will save you many stressful (pun intended) hours





Chapter 3 Errors and Exceptions
This chapter explains the different type of exceptions and how to do error handling. You will also find out what a ‘doomed transaction’ is, this is the one where you get this user friendly message: “The current transaction cannot be commited and cannot support operations that write to the log file. Roll back the transaction.”

Chapter 4 Privilege and Authorization
This chapter explains what impersonation and ownership chaining is. Also covered is how to use EXECUTE AS and how to sign procedures.

Chapter 5 Encryption
This chapter will explain encryption to you in a clear and concise matter. You will learn how to improve performance by using Message Authentication Code. The difference between symmetric and asymmetric key encryption is covered as well as all the terminology that is needed to really understand encryption.

Chapter 6 SQLCLR: Architecture and Design Considerations
What this chapter covers is SQLCLR security, why to use SQLCLR and how to enhance Service Broker Scale-Out with SQLCLR

Chapter 7 Dynamic T-SQL
You want to protect your data? Then this is something you have to read. You will learn how to deal with sql injection, why sp_executesql is much better than exec and the performance implications of parameterization and caching.

Chapter 8 Designing Systems for Application Concurrency
If you are running an OLTP system and you are suffering from blocking/locking then this is the chapter for you. Isolation levels and how they affect concurrency is explained. This chapter uses the SQLQueryStress Performance Tool to show you the difference it makes in performance when you slightly change your proc.

Chapter 9 Working with Spatial Data
Spatial data, this is what a lot of people are storing these dates, unfortunately calculating the distance between 2 points is not as easy as it seems (the earth is not flat you know ;-( ) This chapter covers a couple of ways to represent Geospatial Data.


Chapter 10 Working with Temporal Data
Dates are everywhere in the database but unfortunately a lot of people do not know how dates are stored internally and how to write efficient queries which will cause an index seek instead of a scan. Calendar tables, time zones and intervals are all covered in this chapter

Chapter 11 Trees, Hierarchies, and Graphs
The difference between Nested Set Model, Persisting Materialized Paths and Adjacency list Hierarchies are explained. There is code included that shows you how to traverse up or down the hierarchy, insert new nodes and much more.



Amazon Link: Expert SQL Server 2005 Development

I have also interviewed Adam Machanic a while back, you can find that here: Interview with Adam Machanic Author Of Expert SQL Server 2005 Development

C# IsNullOrEmpty Function In SQL Server

Mladen Prajdic has created a SQL equivalent of the C# IsNotNullOrEmpty
I looked at it and thought that there was way too much code

Here is my version which I have modified, you pass an additional parameter in to indicate whether you want blanks only to count or not


CREATE FUNCTION dbo.IsNotNullOrEmpty(@text NVARCHAR(4000),@BlanksIsEmpty bit)
RETURNS BIT
AS

BEGIN
DECLARE
@ReturnValue bit

IF
@BlanksIsEmpty = 0
BEGIN
SELECT
@ReturnValue= SIGN(COALESCE(DATALENGTH(@text),0))
END
ELSE
BEGIN
SELECT
@ReturnValue= SIGN(COALESCE(DATALENGTH(RTRIM(@text)),0))
END

RETURN
@ReturnValue
END
Go


Here are some calls where we want blanks to return as empty or null
The function returns = if it is empty and 1 if it is not empty

SELECT dbo.IsNotNullOrEmpty(null,1),dbo.IsNotNullOrEmpty('azas',1),
dbo.IsNotNullOrEmpty(' ',1),dbo.IsNotNullOrEmpty('',1)


Here are some calls where we don't want blanks to return as empty or null

SELECT dbo.IsNotNullOrEmpty(null,0),dbo.IsNotNullOrEmpty('azas',0),
dbo.IsNotNullOrEmpty(' ',0),dbo.IsNotNullOrEmpty('',0)

My function is the opposite of Mladen's I check for is NOT null or empty instead of IS null or empty (easier to code it with the SIGN function)

Wednesday, June 13, 2007

SQL Myth: Truncate Cannot Be Rolled Back Because It Is Not Logged

I am still amazed at how many people still think that TRUNCATE TABLE is not logged. There is some logging going on but it is minimal, here is what Books On Line says:

TRUNCATE TABLE removes the data by deallocating the data pages used to store the table's data, and only the page deallocations are recorded in the transaction log.

The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row.

Let’s prove that we can rollback a truncate

Create this table and do the select

CREATE TABLE dbo.Enfarkulator (ID int IDENTITY PRIMARY KEY, SomeOtherCol varchar(49))
GO
INSERT dbo.Enfarkulator VALUES(1)
INSERT dbo.Enfarkulator VALUES(1)



SELECT * FROM dbo.Enfarkulator

ID SomeOtherCol
1 1
2 1


Now run this part

BEGIN TRAN
TRUNCATE TABLE
dbo.Enfarkulator
SELECT * FROM dbo.Enfarkulator
ROLLBACK TRAN


ID SomeOtherCol
(0 row(s) affected)

As you can see the table was truncated, now select from the table again


SELECT * FROM dbo.Enfarkulator

ID SomeOtherCol
1 1
2 1


Yep, the data is there, proving that you can rollback a truncate and all the data will be there. There are two other major difference between truncate and delete which I will explain below.

Truncate doesn’t preserve the identity value but delete does

This is another difference between truncate and delete, truncate will reset the identity value but delete does not. Run the following code to see how that works


CREATE TABLE dbo.Enfarkulator2 (ID int IDENTITY, SomeOtherCol varchar(49))
GO
INSERT dbo.Enfarkulator2 VALUES(1)
INSERT dbo.Enfarkulator2 VALUES(1)


SELECT * FROM dbo.Enfarkulator2
SELECT * FROM dbo.Enfarkulator


DELETE dbo.Enfarkulator2
TRUNCATE TABLE dbo.Enfarkulator

INSERT dbo.Enfarkulator VALUES(1)
INSERT dbo.Enfarkulator2 VALUES(1)

SELECT * FROM dbo.Enfarkulator2
SELECT * FROM dbo.Enfarkulator

The Enfarkulator id was reset and the Enfarkulator2 id was not. In order to do the same with delete you will need to run a dbcc checkident reseed command. Here is the code for that.

DELETE dbo.Enfarkulator2
TRUNCATE TABLE dbo.Enfarkulator

DBCC CHECKIDENT (Enfarkulator2, RESEED, 0)

Now insert again and you will see that the values are the same.

INSERT dbo.Enfarkulator VALUES(1)
INSERT dbo.Enfarkulator2 VALUES(1)

SELECT * FROM dbo.Enfarkulator2
SELECT * FROM dbo.Enfarkulator



You can’t truncate tables that are referenced by a foreign key constraint.

If you have a table which is referenced by another table with a foreign key constraint then you cannot truncate that table. Here is the code for that

CREATE TABLE dbo.Enfarkulator3 (ID int IDENTITY, SomeOtherCol varchar(49))
GO
INSERT dbo.Enfarkulator3 VALUES(1)



Now let’s add the foreign key

ALTER TABLE dbo.Enfarkulator3 ADD CONSTRAINT [FK_Fark3_Fark]
FOREIGN KEY ([ID]) REFERENCES [dbo].[Enfarkulator] ([ID])


Now try to truncate.

TRUNCATE TABLE Enfarkulator

Server: Msg 4712, Level 16, State 1, Line 1
Cannot truncate table 'Enfarkulator' because it is being referenced by a FOREIGN KEY constraint.

See? You cannot do that

--Clean up time ;-)
DROP TABLE dbo.Enfarkulator3,dbo.Enfarkulator2,dbo.Enfarkulator


Cross-posted from SQLBlog! - http://www.sqlblog.com/

Friday, June 08, 2007

Three New SQL Server Best Practices Articles On TechNet

Predeployment I/O Best Practices

The I/O system is important to the performance of SQL Server. When configuring a new server for SQL Server or when adding or modifying the disk configuration of an existing system, it is good practice to determine the capacity of the I/O subsystem prior to deploying SQL Server. This white paper discusses validating and determining the capacity of an I/O subsystem. A number of tools are available for performing this type of testing. This white paper focuses on the SQLIO.exe tool, but also compares all available tools. It also covers basic I/O configuration best practices for SQL Server 2005.
On This Page

Overview

Determining I/O Capacity

Disk Configuration Best Practices & Common Pitfalls

SQLIO

Monitoring I/O Performance Using System Monitor

Conclusion

Resources



Partial Database Availability

This white paper outlines the fundamental recovery and design patterns involving the use of filegroups in implementing partial database availability in SQL Server 2005. As databases become larger and larger, the infrastructure assets and technology that provide availability become more and more important.

The database filegroups feature introduced in previous versions of SQL Server enables the use of multiple database files in order to host very large databases (VLDB) and minimize backup time. With data spanning multiple filegroups, it is possible to construct a database layout whereby failure of certain data resources do not render the entire solution unavailable. This increases the availability of solutions that use SQL Server and further reduces the surface area of failure that would render the database totally unavailable.



Comparing Tables Organized with Clustered Indexes versus Heaps

In SQL Server 2005, any table can have either clustered indexes or be organized as a heap (without a clustered index.) This white paper summarizes the advantages and disadvantages, the difference in performance characteristics, and other behaviors of tables that are ordered as lists (clustered indexes) or heaps. The performance for six distinct scenarios where DML operations are performed on these tables are measured and detailed observations presented. This white paper provides best practice recommendations on the merits of the two types of table organization, along with examples of when you might want to use one or the other.
On This Page

Introduction

Clustered Indexes and Heaps

Test Objectives

Test Methodology

Test Results and Observations

Recommendations

Appendix: Test Environment

SQL Teaser NULL vs COALESCE

Without running the code, try to guess the output

DECLARE @v1 VARCHAR(3)
DECLARE @i1 INT


SELECT
ISNULL(@i1, 15.00) /2,
COALESCE(@i1 , 15.00) /2,
ISNULL(@v1, 'Teaser #2'),
COALESCE(@v1, 'Teaser #2')

I hope you will use COALESCE instead of ISNULL from now on ;-)

Cross-posted from SQLBlog! - http://www.sqlblog.com/

Wednesday, June 06, 2007

How To Protect Yourself From Fat-Finger Sally, Crazy Bosses and Other SQL Villains

You all have been through this at least once in your life. In your shop there is this one person who likes to use Enterprise Manager as their Rapid Data Entry Application. We all know how these people operate; they delete rows, drop tables and all kinds of other funky stuff. SQL Server 2005 has DDL triggers to help you protect against these scoundrels. What about if you are still running that piece of software from the late Triassic period known as SQL Server 2000, what can help you in that case? Don’t worry I will show you a way but first I will tell you a story. About 6 years ago I worked in New York City as a consultant on a project for a nonprofit organization. I looked in the database and found this table which was named YesNoTable. I was curious I opened the table and noticed it had only 2 rows. Here is what was stored in the table.

0 no
1 yes

I dropped it immediately. 5 minutes went by and suddenly the CRM application was broken. They ran the debugger and found out a table was missing. Luckily for me it was very easy to recreate this table. And yes, we did get rid of it soon after. Now had the table be used by a view which had been created with schemabinding I would not be able to drop the table without dropping the view first. You see even I became a SQL villain one time.

What the code below does is it will loop through all the user created tables then union them all, I created a where 1 =0 WHERE clause just in case someone decides to open the view. Since a union can only have 250 selects or so, I have created the code so that you can specify how many tables per view you would like, you do that with the @UnionCount variable.

The code does print statements it does not create the views

If you run the code in the msdb database and you specify 5 as the @UnionCount your output will be this

-- ****************************
-- **** View Starts Here *****
-- ****************************
CREATE VIEW DoNotDropMe_1 WITH SCHEMABINDING
AS
SELECT 1 As Col1 FROM [dbo].[log_shipping_databases]
WHERE 1=0
UNION ALL
SELECT 1 As Col1 FROM [dbo].[log_shipping_monitor]
WHERE 1=0
UNION ALL
SELECT 1 As Col1 FROM [dbo].[log_shipping_plan_databases]
WHERE 1=0
UNION ALL
SELECT 1 As Col1 FROM [dbo].[log_shipping_plan_history]
WHERE 1=0
UNION ALL
SELECT 1 As Col1 FROM [dbo].[log_shipping_plans]
WHERE 1=0
GO


-- ****************************
-- **** View Starts Here *****
-- ****************************
CREATE VIEW DoNotDropMe_2 WITH SCHEMABINDING
AS
SELECT 1 As Col1 FROM [dbo].[RTblClassDefs]
WHERE 1=0
UNION ALL
SELECT 1 As Col1 FROM [dbo].[RTblDatabaseVersion]
WHERE 1=0
UNION ALL
SELECT 1 As Col1 FROM [dbo].[RTblDBMProps]
WHERE 1=0
UNION ALL
SELECT 1 As Col1 FROM [dbo].[RTblDBXProps]
WHERE 1=0
UNION ALL
SELECT 1 As Col1 FROM [dbo].[RTblDTMProps]
WHERE 1=0
GO


The code is not very complex if there are more tables in the DB than you specify in the @UnionCount variable then it will do them in chunks of whatever you specified, if there are less then it will do all of them in 1 view.
Below is the code, if you have any questions then feel free to leave a comment.

USE msdb
SET NOCOUNT ON

DECLARE @UnionCount int
SELECT @UnionCount = 20

IF @UnionCount > 250 OR @UnionCount <1
BEGIN
RAISERROR ('@UnionCount has to be between 1 and 250', 16, 1)
RETURN
END

SELECT identity(int,1,1) AS id,QUOTENAME(table_schema) + '.' + QUOTENAME(table_name) AS tablename
INTO #Tables
FROM information_schema.tables
WHERE table_type ='base table'
AND OBJECTPROPERTY(OBJECT_ID(table_name),'IsMSShipped') = 0
ORDER BY table_name

DECLARE @maxloop int
DECLARE @loop int
DECLARE @tablename varchar(200)




SELECT @maxloop = MAX(id)
FROM #Tables

BEGIN
DECLARE @OuterLoopCount int, @OuterLoop int

SELECT @OuterLoopCount = COUNT(*) FROM #Tables
WHERE id %@UnionCount =0

SELECT @OuterLoopCount = COALESCE(NULLIF(@OuterLoopCount,0),1)



IF (SELECT COUNT(*) FROM #Tables) % 10 <> 0
SELECT @OuterLoopCount = @OuterLoopCount +1

SELECT @OuterLoop =1

SELECT @Loop = MIN(id),@maxloop=MAX(id)
FROM #Tables WHERE ID <= @UnionCount * @OuterLoop

WHILE @OuterLoop <=@OuterLoopCount
BEGIN
SELECT @Loop = MIN(id),@maxloop=MAX(id)
FROM #Tables WHERE ID <= @UnionCount * @OuterLoop
AND id > (@UnionCount * @OuterLoop) - @UnionCount


PRINT'-- **************************** '
PRINT'-- **** View Starts Here ***** '
PRINT'-- **************************** '
PRINT 'CREATE VIEW DoNotDropMe_' + CONVERT(VARCHAR(10),@OuterLoop) + ' WITH SCHEMABINDING'+ char(10) + 'AS'
WHILE @Loop <= @maxloop
BEGIN
SELECT @tablename = tablename
FROM #Tables
WHERE id = @Loop
PRINT 'SELECT 1 As Col1 FROM ' + @tablename + char(10) + 'WHERE 1=0'
IF @Loop < @maxloop
PRINT UNION ALL'
SET @Loop = @Loop + 1

END
SET @OuterLoop = @OuterLoop + 1
PRINT 'GO'
PRINT ''
PRINT ''

END
END


DROP table #Tables


Cross-posted from SQLBlog! - http://www.sqlblog.com/

Monday, June 04, 2007

SQL Server 2008 Is RTM (According to SERVERPROPERTY('productlevel'))

SELECT @@VERSION,
SERVERPROPERTY('productversion'),
SERVERPROPERTY('productlevel')

Microsoft SQL Server code name "Katmai" - 10.0.1019.17 (Intel X86)
May 24 2007 15:26:55 Copyright (c) 1988-2007 Microsoft Corporation
Developer Edition on Windows NT 5.1 (Build 2600: Service Pack 2)

10.0.1019.17
RTM (???)

Also interesting is that the tools (SSMS) are the same as with SQL Server 2005, If you have SQL Server 2005 already installed it will skip installing those.

Okay, this is the last Katmai post....for today.....I promise.....


Cross-posted from SQLBlog! - http://www.sqlblog.com/

SQL Server 2008 (Katmai) Cannot Be Installed On A PC With SQL Server 2000 On It

I tried installing the SQL Server 2008 – June CTP on one of my machines which had SQL Server 2000 installed; it does not let you do that. I guess this is the time to uninstall SQL Server 2000. That is just what I did. First thing I noticed is that MERGE statement is back. IIRC the MERGE statement was also in SQL Server 2005 Beta2 but got pull out of the RTM

Here is a small example of using MERGE from the Katmai Books On Line

MERGE FactBuyingHabits AS fbh
USING (SELECT CustomerID, ProductID, PurchaseDate FROM PurchaseRecords) AS src
ON (fbh.ProductID = src.ProductID AND fbh.CustomerID = src.CustomerID)
WHEN MATCHED THEN
UPDATE SET fbh.LastPurchaseDate = src.PurchaseDate
WHEN NOT MATCHED THEN
INSERT (CustomerID, ProductID, LastPurchaseDate)
VALUES (src.CustomerID, src.ProductID, src.PurchaseDate);




Cross-posted from SQLBlog! - http://www.sqlblog.com/

SQL Server 2008 June CTP now available!

From the connect site

Preview upcoming Releases: SQL Server 2008 June CTP now available!
We are always working on something new to make SQL Server even better. We now have SQL Server 2008 June CTP available for testing.

Link doesn't work yet, check back later here: https://connect.microsoft.com/SQLServer

SQL Server 2008/Katmai Webcast

TechNet Webcast: The Next Release of Microsoft SQL Server: Overview (Level 200)

Simulcast from Microsoft Tech·Ed 2007 in Orlando, FL.
This session provides an overview of the next release of Microsoft SQL Server that is currently under development. We cover the core value proposition, major themes and scenarios, and some specific improvements. We also discuss the new development processes Microsoft is using to build this release, the release timeline, and the disclosure calendar.

More details here: http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032341071&EventCategory=2&culture=en-US&CountryCode=US

Sunday, June 03, 2007

Did You Know SQL Server Has A Black Box Like An Airplane?

Paul Randal writes:
"Kimberly mentioned that SQL Server has a 'black-box' trace, similar to an
aircraft flight-recorder, which I'd never heard of. It's an internal trace that
has the last 5MB of various trace events and it's dumped to a file when SQL
Server crashes. This can be really useful if you're troubleshooting an issue
that causing SQL Server to crash or someone or something is telling SQL Server
to shutdown and its unclear who or what is doing it."


Read here how to turn it on: http://blogs.msdn.com/sqlserverstorageengine/archive/2007/06/03/sql-server-s-black-box.aspx

Cross-posted from SQLBlog! - http://www.sqlblog.com/

Excel 2007 Files Are WinZip/WinRAR Files In Disguise

I am not going to say anything about the ribbon or the diamond, this not a post about that. Create a new file in Excel 2007 with 250000 (that is a quarter million) rows. Just have 1 column with numbers from 1 to 250000. Save the file but keep it open. I applied a currency format to mine, see pic below.




Just for fun save it also in Excel 2003 format, you will get a warning and the file will only contain 65536 rows. I also captured that as a screenshot, the pic is below.




Now look at the files, do you see that the Excel 2003 file is bigger in size than the Excel 2007 file even though it has only one fourth the rows. Now locate the file and rename the xlsx extension of the Excel 2007 file to rar or zip. That is right the file is compressed, open the file up with WinZip or WinRAR. Open the Worksheets folder inside the xl folder, you will see a worksheet there which is about 17MB unpacked and about 2 MB packed. The pic of the RAR file is below.



Interesting stuff right?


Cross-posted from SQLBlog! - http://www.sqlblog.com/

Thursday, May 31, 2007

Sam Gentile Reviews RubyMicrosoft Essay By Martin Fowler

Martin Fowler asked Sam Gentile to review and contribute thoughts to his very important new essay RubyMicrosoft. This piece presents the view that Microsoft is at a crossroads, an important time in its life where Microsoft can make good of opportunities or choose an altogether different road.

Read the article here: http://codebetter.com/blogs/sam.gentile/archive/2007/05/31/microsoft-at-the-crossroads.aspx

Here is the most interesting comment

On the alpha geek side, I fear all is lost already. All of my peers onCodeBetter.com and the "Agile .NET" community have already moved ontoCastle/Windsor, NUNit, NAnt, MonoRail, Spring.NET, NHibernate, etcinstead of Microsoft solutions. It’s virtually over already. For twoyears now, I have talked about our Agile team and how we can't useVisual Studio Team System and instead have to use CruiseControl.NET,NUnit, NAnt, etc to work in an Agile fashion. Not only does Microsoftnot understand this, but the majority of Microsoft programmers don't.They have been weaned on being "Morts" and having wizards, storedprocedures, drag & drop forced on them and not required to learn thesolid skills that make up what we think of as a developer


What is your opinion? We are using Subversion and currently testing with CruiseControl, NAnt and NUnit

Cross-posted from SQLBlog! - http://www.sqlblog.com

Wednesday, May 30, 2007

Speed Up Performance And Slash Your Table Size By 90% By Using Bitwise Logic

You have all seen websites where you can pick a bunch of categories by selection a bunch of check boxes. usually what you do is store those in a lookup table and then you create another table where you store all the categories for each customer.
What if I tell you that you can store all that info in 1 row instead of 10 rows if a customer picked 10 categories.
Take a look at this




1 Classic Rock
2 Hard Rock
4 Speed/Trash Metal
You will store a value of 1 + 2 + 4 = 7(you just sum the values)

Now run this to check, the result will be 7 for a match and some other value otherwise



select 7 | 1,
7 | 2,
7 |3,
7 |4,
7 |5,
7 |6,
7 |7,
7 |8,
7 |20



What is this |(pipe symbol)?
From Books on line
The bitwise operator performs a bitwise logical OR between the two expressions, taking each corresponding bit for both expressions. The bits in the result are set to 1 if either or both bits (for the current bit being resolved) in the input expressions have a value of 1; if neither bit in the input expressions is 1, the bit in the result is set to 0.
The bitwise operator requires two expressions, and it can be used on expressions of only the integer data type category.




Here is how you would typically use this, first create this table




CREATE TABLE NumbersTable (Num int)
INSERT NumbersTable VALUES(1)
INSERT NumbersTable VALUES(2)
INSERT NumbersTable VALUES(3)
INSERT NumbersTable VALUES(4)
INSERT NumbersTable VALUES(5)
INSERT NumbersTable VALUES(6)
INSERT NumbersTable VALUES(7)
INSERT NumbersTable VALUES(8)
INSERT NumbersTable VALUES(9)
INSERT NumbersTable VALUES(10)
INSERT NumbersTable VALUES(11)
INSERT NumbersTable VALUES(12)
GO

Now run this

SELECT
Num,
CASE 7 |Num WHEN 7 THEN 'Yes' ELSE 'No' END AS COL
FROM NumbersTable

Here is the output




Num COL
---- ---
1 Yes
2 Yes
3 Yes
4 Yes
5 Yes
6 Yes
7 Yes
8 No
9 No
10 No
11 No
12 No




Okay enough theory let's start with some SQL code. First create this table which will hold all the categories




CREATE TABLE MusicChoice (ID INT PRIMARY KEY,
ChoiceDescription VARCHAR(100))




INSERT MusicChoice VALUES(1,'Classic Rock')
INSERT MusicChoice VALUES(2,'Hard Rock')
INSERT MusicChoice VALUES(3,'Speed/Trash Metal')
INSERT MusicChoice VALUES(4,'Classical')
INSERT MusicChoice VALUES(5,'Rap')
INSERT MusicChoice VALUES(6,'Blues')
INSERT MusicChoice VALUES(7,'Jazz')
INSERT MusicChoice VALUES(8,'Alternative Rock')
INSERT MusicChoice VALUES(9,'Easy Listening')
INSERT MusicChoice VALUES(10,'Progressive Rock')
INSERT MusicChoice VALUES(11,'Punk Rock')
INSERT MusicChoice VALUES(12,'Swing')
INSERT MusicChoice VALUES(13,'Techno')
INSERT MusicChoice VALUES(14,'Pop')
INSERT MusicChoice VALUES(15,'Disco')
INSERT MusicChoice VALUES(16,'Big Band')
INSERT MusicChoice VALUES(17,'Gospel')
INSERT MusicChoice VALUES(18,'Heavy Metal')
INSERT MusicChoice VALUES(19,'House')
INSERT MusicChoice VALUES(20,'Celtic')
Now create the Bitwise table

CREATE
TABLE BitwiseMusicChoice (ID INT PRIMARY KEY,
ChoiceDescription VARCHAR(100))





We will use the POWER function to create the correct values
run this

SELECT
id,POWER(2,id-1)BitID,ChoiceDescription
FROM MusicChoice




Here is the output
id BitID ChoiceDescription
1 1 Classic Rock
2 2 Hard Rock
3 4 Speed/Trash Metal
4 8 Classical
5 16 Rap
6 32 Blues
7 64 Jazz
8 128 Alternative Rock
9 256 Easy Listening
10 512 Progressive Rock
11 1024 Punk Rock
12 2048 Swing
13 4096 Techno
14 8192 Pop
15 16384 Disco
16 32768 Big Band
17 65536 Gospel
18 131072 Heavy Metal
19 262144 House
20 524288 Celtic




Now insert it into the BitwiseMusicChoice table




INSERT BitwiseMusicChoice
SELECT POWER(2,id-1)BitID,ChoiceDescription
FROM MusicChoice




Now create this customer table


CREATE
TABLE Customer (CustomerID int identity, CustomerCode uniqueidentifier not null)




Insert these 5 values first, we will use these to compare performance later




INSERT Customer VALUES('1DAB5C03-BC23-4FB5-AC3D-A46489459FE9')
INSERT Customer VALUES('F7DDCDBC-F646-493A-B872-4E2E82EA8E14')
INSERT Customer VALUES('E8A4C3D2-AEB0-4821-A49D-3BF085354448')
INSERT Customer VALUES('52581088-C427-4D2F-A782-250564D44D8C')
INSERT Customer VALUES('1B2622C4-6C17-4E74-99D6-336197FBBCFF')

Now we will insert a total of 10000 customers




SET NOCOUNT ON
BEGIN
TRAN
DECLARE
@LoopCounter INT
SET
@LoopCounter = 6
WHILE @LoopCounter <= 10000
BEGIN
INSERT
Customer VALUES(NEWID())
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
WORK
GO




Now add the primary key




ALTER TABLE Customer ADD CONSTRAINT pk_Customer PRIMARY KEY (CustomerCode)

Create another table to hold the choices

CREATE
TABLE CustomerMusicChoice (id INT identity, MusicChoiceID int, CustomerCode uniqueidentifier)




ALTER TABLE CustomerMusicChoice ADD CONSTRAINT fk_MusicChoice_ID FOREIGN KEY (MusicChoiceID) REFERENCES MusicChoice(ID)




ALTER TABLE CustomerMusicChoice ADD CONSTRAINT fk_CustomerCode FOREIGN KEY (CustomerCode)REFERENCES Customer(CustomerCode)




For each customer insert 10 random choices, this should run less than a minute



SET NOCOUNT ON
BEGIN
TRAN
DECLARE
@LoopCounter INT
DECLARE
@CustID uniqueidentifier
SET
@LoopCounter = 1
WHILE @LoopCounter <= 10000
BEGIN
SELECT
@CustID = CustomerCode
FROM Customer
WHERE CustomerID = @LoopCounter
INSERT Customer VALUES(NEWID())
INSERT CustomerMusicChoice(MusicChoiceID,CustomerCode)
SELECT TOP 10 id,@CustID
FROM MusicChoice
ORDER BY NEWID()
SET @LoopCounter = @LoopCounter + 1
END
COMMIT
WORK
GO

Now add these indexes
CREATE INDEX ix_CustomerMusicChoice_Cust On CustomerMusicChoice(CustomerCode)



CREATE INDEX ix_CustomerMusicChoice_ID On CustomerMusicChoice(MusicChoiceID)




Create the BitwiseCustomerMusicChoice which will hold the Bitwise values




CREATE TABLE BitwiseCustomerMusicChoice (id INT identity, MusicChoiceID int, CustomerCode uniqueidentifier not null)




This will populate the BitwiseCustomerMusicChoice table




INSERT INTO BitwiseCustomerMusicChoice
SELECT SUM(POWER(2,MusicChoiceID-1)) as MusicChoiceID,CustomerCode
FROM CustomerMusicChoice
GROUP BY CustomerCode




Add the index and foreign key




ALTER TABLE BitwiseCustomerMusicChoice ADD CONSTRAINT pk_BitwiseCustomerMusicChoice PRIMARY KEY (CustomerCode)




ALTER TABLE BitwiseCustomerMusicChoice ADD CONSTRAINT fk_BitwiseCustomerCode FOREIGN KEY (CustomerCode)REFERENCES Customer(CustomerCode)

Now let's test performance. Hit CTRL + K (SQL 2000) or CTRL + M (SQL 2005)



These 2 queries will return something like this




ID ChoiceDescription Picked
8 Alternative Rock No
16 Big Band No
6 Blues No
20 Celtic No
1 Classic Rock No
4 Classical Yes
15 Disco Yes
9 Easy Listening Yes
17 Gospel No
2 Hard Rock No
18 Heavy Metal Yes
19 House Yes
7 Jazz Yes
14 Pop Yes
10 Progressive Rock Yes
11 Punk Rock No
5 Rap No
3 Speed/Trash Metal Yes
12 Swing Yes
13 Techno No


SELECT
mc.ID,ChoiceDescription,CASE WHEN CustomerCode IS NULL THEN 'No' ELSE 'Yes' END Picked
FROM CustomerMusicChoice cmc
RIGHT JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
AND CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
ORDER BY ChoiceDescription




SELECT bmc.ID,ChoiceDescription,
CASE WHEN bmc.ID |MusicChoiceID =MusicChoiceID THEN 'Yes'
ELSE 'No'
END AS Picked
FROM BitwiseCustomerMusicChoice cmc
CROSS JOIN BitwiseMusicChoice bmc
WHERE CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
ORDER BY ChoiceDescription




Look at the execution plan
67.60% against 32.40% not bad right?



Plan1




Now run this, we will add AND bmc.ID > 0 to both queries. This will change an index scan to an index seek in the bottom query




SELECT mc.ID,ChoiceDescription,CASE WHEN CustomerCode IS NULL THEN 'No' ELSE 'Yes' END Picked
FROM CustomerMusicChoice cmc
RIGHT JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
AND CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
AND mc.ID > 0
ORDER BY ChoiceDescription




SELECT bmc.ID,ChoiceDescription,
CASE WHEN bmc.ID |MusicChoiceID =MusicChoiceID THEN 'Yes'
ELSE 'No'
END AS Picked
FROM BitwiseCustomerMusicChoice cmc
CROSS JOIN BitwiseMusicChoice bmc
WHERE CustomerCode ='1DAB5C03-BC23-4FB5-AC3D-A46489459FE9'
AND bmc.ID > 0
ORDER BY ChoiceDescription



That improved the performance a little. 82.75% against 17.25%





Plan2




Now look at the tables, after running dbcc showcontig you can see that the BitwiseCustomerMusicChoice is about 1/10th the size of the CustomerMusicChoice table which is as expected.


dbcc showcontig ('BitwiseCustomerMusicChoice')
---------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'BitwiseCustomerMusicChoice' table...
Table: 'BitwiseCustomerMusicChoice' (772197801); index ID: 1, database ID: 26
TABLE level scan performed.
- Pages Scanned................................: 41
- Extents Scanned..............................: 6
- Extent Switches..............................: 5
- Avg. Pages per Extent........................: 6.8
- Scan Density [Best Count:Actual Count].......: 100.00% [6:6]
- Logical Scan Fragmentation ..................: 0.00%
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 48.0
- Avg. Page Density (full).....................: 99.41%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.




dbcc showcontig ('CustomerMusicChoice')
---------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'CustomerMusicChoice' table...
Table: 'CustomerMusicChoice' (724197630); index ID: 0, database ID: 26
TABLE level scan performed.
- Pages Scanned................................: 428
- Extents Scanned..............................: 55
- Extent Switches..............................: 54
- Avg. Pages per Extent........................: 7.8
- Scan Density [Best Count:Actual Count].......: 98.18% [54:55]
- Extent Scan Fragmentation ...................: 40.00%
- Avg. Bytes Free per Page.....................: 386.5
- Avg. Page Density (full).....................: 95.22%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


What happens if you want to get the total count of for example Classical?




SELECT COUNT(*)
FROM CustomerMusicChoice cmc
JOIN MusicChoice mc on cmc.MusicChoiceID = mc.id
WHERE mc.ChoiceDescription ='Classical'

SELECT COUNT(*)
FROM BitwiseCustomerMusicChoice cmc
JOIN BitwiseMusicChoice bmc ON bmc.ID |MusicChoiceID =MusicChoiceID
WHERE bmc.ChoiceDescription ='Classical'

Here are execution plans for SQl Server 2000 and 2005

Plan3A

Plan3B

As you can see SQL Server 2005 has a bigger difference than SQL Server 2000



Now let's look at the overal picture, on a busy system you will have the customer queries running many times an hour/day. The report queries will run maybe a couple a times a day. I think this trade off is perfectly acceptable because overall your system will perform better. Another thing to keep in mind is that instead of 10 inserts you only have to do 1, same with updates, all these little things add up to a lot eventualy.




So as you can see using bitwise logic is a great way to accomplish a couple of things
Reduce table size
Speed up backup and recovery because your table is much smaller
Improve performance



Of course you have to do some testing for yourself because it might not be appropriate for your design. If your system is more of an OLAP than OLTP type of system then don't bother implementing this since it won't help you.




Cross-posted from SQLBlog! - http://www.sqlblog.com