SQL Server Code,Tips and Tricks, Performance Tuning: Book

Showing posts with label Book. Show all posts

Thursday, September 10, 2015

Review of SQL Performance Explained by Markus Winand

SQL Performance Explained is a database book written by Markus Winand, the book is a little less than 200 pages. The book covers Oracle, MySQL, PostgreSQL and SQL Server. The terminology used in the book is that of the Oracle RDBMS but there are site notes for the other RDBMS products. The primary focus of this book is indexing, if you are interested in indexing then this book is for you, the one index this book covers in the B-tree Index, other indexes are not covered in this book at all.

There are 8 chapters in this book and an appendix, here is what is in the book.

CHAPTER 1 - Anatomy of an Index
The first chapter is the only one that doesn't cover SQL specifically; it is about the fundamental structure of an index. An understanding of the index structure is essential to following the later chapters — don’t skip this!
Although the chapter is rather short — only about eight pages — after working through the chapter you will already understand the phenomenon of slow indexes.

CHAPTER 2 - The Where Clause
This is where we pull out all the stops. This chapter explains all aspects of the where clause, from very simple single column lookups to complex clauses for ranges and special cases such as LIKE. This chapter makes up the main body of the book. Once you learn to
use these techniques, you will write much faster SQL.

CHAPTER 3 - Performance and Scalability
This chapter is a little digression about performance measurements and database scalability. See why adding hardware is not the best solution to slow queries.

CHAPTER 4 - The Join Operation
Back to SQL: here you will find an explanation of how to use indexes to perform a fast table join.

CHAPTER 5 - Clustering Data
Have you ever wondered if there is any difference between selecting a single column or all columns? Here is the answer — along with a trick to get even better performance.

CHAPTER 6 - Sorting and Grouping
Even order by and group by can use indexes.

CHAPTER 7 - Partial Results
This chapter explains how to benefit from a “pipelined” execution if you don’t need the full result set.

CHAPTER 8 - Insert, Delete and Update
How do indexes affect write performance? Indexes don’t come for free — use them wisely!

APPENDIX A - Execution Plans
Asking the database how it executes a statement.

So who is this book for? This book is primarily for developers, if you are an admin and you want to know about how indexing work, then this book will help you understand that. Don't expect to find how to reorganize or rebuild indexes, that is not the focus of this book.

I enjoyed this book a lot, there is a ton of useful information for developers and as a developer you should understand how data is stored and how you can get to the data more efficiently. Once you are done with this book, you will understand databases and retrieval of data much better and you might even rethink how you are storing some of the data. I also like the fact that this book is a little less than 200 pages and focused one one thing instead of those monster 1000+ page books that cover dozens of topics but don't delve deep into any of those topics.

I give this book two thumbs up because it does what the author's intentions, is focused on indexing and explains it well. I do have one little bone to pick with the book and that is that the book does not mention the XML Execution plan that was introduced with SQL Server 2005.

You can check out the book at the book's website here: http://sql-performance-explained.com/

You can also check Amazon here: SQL Performance Explained

Tuesday, October 13, 2009

Win A Copy Of SQL Server 2008 Administration in Action

Rod Colledge was kind to give the readers of Less Than Dot a chance to win a pdf copy of his book SQL Server 2008 Administration in Action.

Here is how you can win this book, first read the Interview With Rod Colledge About The Book SQL Server 2008 Administration in Action post and tell me which question or answer you liked the most and why. Leave a comment on that post, do not leave it here. I will announce a winner this Friday at 12 PM EST here: Interview With Rod Colledge About The Book SQL Server 2008 Administration in Action

Thursday, September 18, 2008

Interview With Louis Davidson Author of Pro SQL Server 2008 Relational Database Design and Implementation

I interviewed Louis Davidson in 2005 about the 2005 edition of his book. I did another one today about Pro SQL Server 2008 Relational Database Design and Implementation. You can check out the interview at the link below, you will enjoy it

Interview With Louis Davidson Author of Pro SQL Server 2008 Relational Database Design and Implementation

Monday, May 19, 2008

Spring Cleaning, Getting Rid Of Some Books

I did some spring cleaning yesterday, I got rid of some old or obsolete books. I asked some people at work if they want any of these books and I believe all the books except for the JCL book are about to be given away.

I got rid of these books because:
I got a new version
I don't use the technology anymore
The book is obsolete

Click on the image and you can see a bigger version on my flickr page.

Thursday, February 07, 2008

Review of Inside Microsoft SQL Server 2005 Query Tuning and Optimization

SQL performance tuning is probably one of those things you can do to really make a HUGE difference in performance. Let’s put this in perspective: take a typical application, if you can improve the performance by 100% then you really made a huge improvement. You can improve a SQL query by 1000% with 2 lines of code (sometimes all you have to do is take away a % sign). If you can make a query sargable so that the optimizer can do an index seek instead of an index scan your query might go from 12 seconds to 200 milliseconds. Now try doing that in an application, even if you change all the string concatenation to use a stringbuilder instead of creating new strings all the time you will not get such a drastic performance improvement. I am sure you get the point by now, let’s talk about the book.

Inside Microsoft SQL Server 2005: Query Tuning and Optimization is part 4 of the Inside Microsoft SQL Server 2005 series, it is written by Kalen Delaney and five other authors. There are 6 chapters in this book

1 A Performance Troubleshooting Methodology

This chapter explains some typical things that affect performance and also gives a troubleshooting overview

2 Tracing and Profiling

This chapter explains how to use the profiler and how to analyze traces. SQL Server’s built-in traces are also covered

3 Query Execution

This chapter gives a query processing and execution overview. It explains how to read plans and goes into a lot of detail about analyzing plans

4 Troubleshooting Query Performance

This chapter explains how to detect problems in plans, how to improve queries and some best practices

5 Plan Caching and Recompilation

This chapter goes into detail about plan caching and recompilation and how to troubleshoot plan cache issues

6 Concurrency Problems

The final chapter deals with concurrency (locking, blocking and deadlocking)

This is an excellent book for an intermediate/advanced developer. There is so much new stuff in SQL Server 2005 compared to 2000 to help you with tuning queries that you probably want to read each chapter several times. The Dynamic Management Views are a big help and this book shows you how to use them. Some other cool stuff in this book is the discussion of internal tables, undocumented DBCC commands and undocumented trace flags to discover information which could help you determine much faster what the cause of a performance problem might be.
Some pages are packed with so much information that you need to pause for a second and process all that info (I have read some pages two to three times in a row). You will also find out that there are more joins besides left, full and outer. Page 137 for example has a nice table with the three Physical Join Operators: Nested Loop Join, Hash Join and Merge Join. This table lists the characteristics for each of these joins.

If you are an intermediate to advanced developer then I highly recommend this book. I have interviewed Kalen a while back about this book and you can find that interview here
http://sqlservercode.blogspot.com/2007/10/interview-with-kalen-delaney-about.html

Monday, December 17, 2007

Interview With Joe Celko About The Forthcoming Book Thinking In Sets

I noticed that Joe Celko has a new book coming out soon: Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL. I decided to send Joe an email to see if he would be willing to answer some questions I had about the book and SQL in general. To my surprise Joe was more than willing to accommodate my request. The question-and-answer session with Joe that follows was conducted via email.

Is the book geared towards a beginner/intermediate level user or do you have to be an advanced user to really utilize the information in this book?
I would say intermediate level. You need to know enough SQL to do some programming in the language and be experienced enough to see that DDL is as important as DML.

What are the most important things a person can do to master SQL?
The most important thing is to make the leap from procedural programming to declarative programming, from sequential files to sets. The only declarative language that most programmers have seen is spreadsheets. They are nothing like SQL!

I assume you wrote this book because people when they first use a database tend to do the same thing they do in Java, C# or other procedural languages; get a bunch of rows and manipulate them one by one. Is this the number one mistake beginners do?
That is right up there in the top five, for sure! But I think that the classic error is in not knowing how to design the schema in the first place. A bad schema -- denormalized, bad data element names, no constraints, no proper keys, no referential integrity -- leads to trying to correct the flaws in DDL in the DML. If you have a good schema, then the queries, inserts, updates, and deletes are usually fairly easy. All of the "hard work" should be hidden in the database and not spread over the application code and DML.

What is so bad about attribute splitting (somehow these three tables come to mind: Squids, Automobiles and Britney Spears)?
NO, no, no! I coined the phrase a "Squids, Automobiles and Britney Spears" table or column to refer to a table or column which has more than one kind of entity or more than one kind of attribute in it. What makes that phrase so fun is that there is a web page which compared photos of a giant squid and Britney Spears after she cut off her hair. If you actually gave such tables or columns a meaningful name, then you would see that it is not a proper table or column. These nightmares would have names like "hat_or_shoe_size_depending_on_column_five" Attribute splitting is when you take an attribute and make it into two or more tables, columns or rows. The table example is the most common one. How often have you seen a table for each location (split on geography) or a table for each month (temporal split)? This mimics a tape file system, so newbies feel comfortable. When you ask them if they would split a Personnel table into MalePersonnel and FemalePersonnel (split on gender), they immediately see the fallacy. Unfortunately, splits need all kinds of code if they want any data integrity. This code usually re-assembles the data elements back to what they should have been in the first place.

In chapter two of your new book you talk about the new advances in hardware. I have noticed that somehow the amount of data I have to store out paces the advance in hardware and the queries don't run any faster. Will Solid-State Disks finally change that?
I have great hopes for solid state devices. Any solid thing that has to physically move is by definition slower than electricity or light. We are looking at nano-tech, better lasers and a ton of new technology almost every month.

I see you have a whole chapter on auxiliary tables, I am a big fan of those tables myself and I use them to create dates on the fly or split comma delimited strings. In your opinion what percentage of developers does not use them and why should you use them?
Not enough. SQL is a language meant to handle data and not to do computations. Auxiliary tables can be shared among sessions and accessed in parallel. Imagine a complicated but deterministic formula. In a procedural computational model, you hang in a loop and re-do the work for each record in a sequential file. Let me give you a real-world example of that. Corky's BBQ does a huge mail-order business at Christmas time. The pick lists need to include the size of the boxes to be used, so part-time help can do their jobs. When I got this problem, they had discovered that using weights was not right way to go. The approach being attempted was to play "3-D Tetris" with the products. Since that kind of thing is a bitch to program, they were getting nowhere. My approach was to look over a few years of shipping history and find out how many different orders they had shipped and what the smallest box use for each of those orders was. There were only about 5000 configurations and the majority were standard gift packages. Do a relational division and a table look-up to handle 99.98% of the cases and print the phrase "Hand pick this order" on the pick list for the exceptions! In the SQL model, you join an auxiliary table which has the parameters and the result value in each row. You can do this simple multi-column equi-join once in parallel. Wait until the multi-core chips make parallelism the only way to design a computer; then auxiliary table are going to really fly.

I noticed you have a big chapter on VIEWs; do you feel that VIEWs are not utilized enough by developers?
They are used either not enough or too much.

Why do you think we should not use bit or Boolean flags in SQL?
In SQL, to be a data type it must allow NULLs. What is the fundamental nature of a BIT? It is one or zero; there is no NULL concept here! This was a problem for SQL Server, when they made their BIT data type into a numeric that could be NULL-able. The change scrambled a lot of data when it was ported from one release to the next. Bits are low-level, hardware dependent concepts. Are you high end or low end hardware? Do you have 4, 8, 16, 32 or 64 bit machine words? You have use proprietary operators. This defeats the idea of machine independence. Finally, bit flags are used to destroy First Normal Form (1NF) and thus destroy data integrity. As an example, recently in as newsgroup someone wanted to use a 4-bit column to store all the possible colors for a product (red, green, blue, yellow) and get them out with a bit mask (his hardware has a nibble!). But how do you add purple? How do you set up a constraint that no item can be mad with both red and green options? In RDBMS, we discover the state of our data with predicates and not by setting flags at the hardware level.

I have your puzzles book and noticed that you have a paragraph on Sudoku and one on Bin Packing in the last chapter of this book. You have procedural solutions and SQL solutions for most of the material in the chapter; is the SQL solution faster?
There is a funny story on the Sudoku problem. Richard Romley is a retired DBA from Smith-Barney and he decided to play with Sudoku for recreation. He used SQL Server on his home machine and coded a solver in one SELECT statement. The procedure takes 81 parameters (the starting grid) and does an 81-way self-join. It produces ALL the valid answers -- bet you did not know that many published puzzles have multiple solutions! The code is straight forward and depends on the optimizer to handle the search condition logic. Even longer problems with tens of answers run in well under one second. The procedural solvers vary, but I have seen some that stop when they get to the first valid grid. If there is only one solution, they are very fast. But is the right answer actually the *set* of valid grids? Since I am an SQL person, I think so. The procedural solvers can get hung up by backtracking to the starting position when there are a few hundred answers and become very slow. I also strongly recommend getting the Japanese or Chinese editions if you read either language. My two translators cleaned up some old code and added new solutions as we went along.

Should having good naming conventions such as 11179-5 be included in database courses?
Drop them in as soon as you start. If you grow up with good conventions, you will start doing it without thinking about it. When I teach RDBMS, I start with scales and measurement theory so that my students know what data is all about -- whether it is in a database or not.

When can we expect your new book to show up in bookstores?
It was supposed to be out in 2008 February, but we lucked up and it will be out in 2008 January. Production was faster than planned. I guess after seven books, and working with the same people, we have it down pretty well!

A bunch of questions not related to the book

Why do you write technical books?
I have no talent for fiction. I cannot get a plot or characters onto paper to save my life and my dialogues are awful chains of "he said-she said" stiff sentences. My grandfather wrote children's poems in Slovak, and I have even less talent for poetry. I have a number of friends who write detective novels, Science Fiction and Fantasy. They don't consider me a real author because I don't do fiction. I think about trying my hand at YA (Young Adult) books -- Danica McKellar (Winnie from the WONDER YEARS television show) just did a math book for girls, so maybe I could do "A Child's Garden of Normal Forms" or a juvenile detective series called "The Hardware Boys", then go on to a television show called "Query Eye for the Database Guy" or something.

Which of your books was the hardest to write and why?
First edition of SQL FOR SMARTIES! It was my first book and I thought that having written a few hundred magazine columns would make it easy. I was dead wrong -- completely different skill set. I was a year late in delivering the manuscript. After that, I had a system in place.

What is coming down the line? Any new books or updates to current ones?
I am trying to do at least one book per year -- more if I am unemployed and need the advances. My current thoughts are a book on the use of Standards in a database, and one on programming tricks with OLAP functions, CTEs, and other new features in the SQL-2003 Standards. My Morgan-Kaufmann books tend to follow a five year cycle, just like the ANSI/ISO Standards. I also get asked by vendors to do product specific books. I might self-publish something completely off-topic. I have a book on domino games based on my postings and I teach Texas-42 on Royal Carri bean Cruises -- it is a domino game only played in Texas. I also have a book on Pai Gow (a gambling game played with Chinese dominoes) which might sell 10 copies. I will also be doing some video classes, but I don't have details yet.

Which of all the SQL books that you wrote is your favorite?
DATA & DATABASES, which never got the sales of the others. It is more philosophical and concerned with the nature of data instead of programming.

What SQL Server books are on your bookshelf?
Anything I can get by Henderson, Machanic, Moreau, Ben Gan and Delaney. The SQL Server experts are pretty well-known and they publish. This is not true for other products, especially the open-source RDBMS products.

Why do you participate in newsgroups and do you think it is a good idea for beginners to ask questions in newsgroups?
To do some shameless self-promotion, of course :) Newsgroups are a good source of SQL problems and some clever answers that I can use in books and when I am consulting. I also have a pedantic streak I did to get out. And if I am available on a newsgroup, people don't fill up my mailbox at home. And, yes, beginners should use newsgroups for help. But not to have someone else do their job or their homework for them. I like to see the mindset of people who are just learning SQL. It is not enough to see that someone is making a mistake; you want to figure out what lead to that particular mistake. Remember Chernobyl? Everyone did just what they were supposed to do, but there were a few critical assumptions that lead to an event cascade.

What are Cowboy Coders and id-iots?
The term "cowboy coder" is an old one. It means someone who starts coding without any design phase, without any overview to the system as a whole, without any research for industry standards or a company data dictionary. They usually love dialect code and tricks that trade immediate performance for maintainability. The heavy dialect code also gives them job security, since they usually only know one product. An "ID-iot" is a newbie who has no RDBMS education and wants to have the comfort of a sequential file system. So he puts an IDENTITY column on every table as the PRIMARY KEY. Never mind that it is proprietary and non-relational; it is the familiar record number from a file system which can use to mimic pointer chains. He does not understand that rows are not records, tables are not files, columns are not fields and references are constraints and not pointers.

I have been working with Sybase IQ for a little bit; what is your opinion on columnar databases?
Sybase IQ is not the only game in town. I consulted with SAND (nee Marcus, nee Nucleus) years ago. It was one of the first such products. Later I ran into WX2 (nee White Cross) and I am looking at Stonebreaker's Verticia now. Their advantages in parallelism and compressing large amounts of data make them the best choice for Data Warehouses. I would also look at Teradata, which uses hashing. That will become more important as the research on minimal perfect hashing functions gets out of the lab and into products.

Where can we expect to see you in 2008? Any conferences, seminars, trade shows or classrooms perhaps?
I will hopefully be doing some more "SQL Saturdays!" on my weekends. I want to do more webcasts, but I am not sure if I am ready for YouTube. My other travel goal for 2008 is to get to Australia or Japan; I have never gone past Hawaii.

Some of Joe Celko's Books:
SQL for Smarties
SQL Programming Style
Trees and Hierarchies in SQL
SQL Puzzles and Answers
Data and Databases

Tuesday, November 06, 2007

Christmas Is Coming Early This Year

Note from my manager:

"All,

It looks like we have some book money left for this year that we can spend.
Please select 2 books each that you have wanted and send the titles to me and I’ll get them on the list."

Very nice, I picked the following two books below.
The Art of SQL
SQL Server 2005 Practical Troubleshooting: The Database Engine

What would you have picked?

Monday, October 15, 2007

Interview With Kalen Delaney About Inside Microsoft SQL Server 2005 Query Tuning and Optimization

I am a big fan of Inside Microsoft SQL Server 2005 The Storage Engine so when I saw that yet another book got published in this series I just had to check it out. Inside Microsoft SQL Server 2005: Query Tuning and Optimization is very well written gets to the point and give you the answers that you need. I decided to contact Kalen to see if she would be willing to do an interview, by now you know that the answer is yes of course ;-)

The question-and-answer session with Kalen that follows was conducted via email.

What is the audience for this book, is it the enterprise user or can a small department benefit from the tips in this book?
Because this book deals with query tuning, anyone who writes SQL queries for SQL Server can benefit. Very little in the book is geared towards system tuning, so the size of the machine doesn’t really matter. Now of course, if you have a very small system with very small tables, you won’t get as much benefit out of tuning your queries. However, if you have any tables of more than a few thousand rows, and you do any joins, you will need to tune your queries. In addition, the issues of blocking and concurrency control can impact any system, no matter how small.

What new technologies in SQL Server 2005 do you think are the most beneficial for performance?
For very large databases, the best new technology is partitioning. For any size system, if you have had serious performance problems due to blocking, you might find a big performance benefit by using one of the snapshot-based isolation levels, but you really need to understand the resource costs that come along with the improved performance. For your individual queries, I think the new optimizer hints and query level recompiles can make a big difference. For indexes, the ability to add included columns to nonclustered indexes can give some of your hard-to-tune queries a major performance boost.

What will a person who reads this book gain in terms of understanding how to performance tune a server?
The focus of this book is not so much on tuning the server, but on tuning queries. There is more in Inside SQL Server 2005: The Storage Engine on server issues such as memory and processor management. The biggest server wide issues are covered in Chapter 5, when I talk about managing the plan cache, and how and when query plans are reused.

Is the book geared towards a beginner/intermediate level user or do you have to be an advanced user to really utilize the information in this book?
The book is not geared towards beginners, but everyone should be able to get something out of it. It’s primarily geared to SQL Server developers and DBA’s who have been working with SQL Server for a while, and have encountered performance problems that they are trying to find solutions for.

With all the changes in SQL Server 2005, how critical has the tempdb become in regards to performance?
Tempdb has always been important. In SQL Server 2005, if you are using one of the snapshot-based isolation levels, you are going to have to be more aware of the demands placed on tempdb, both in the sizing requirements and the additional I/O resources needed. Fortunately, SQL Server 2005 provides tools to monitor tempdb, including a dozen new performance monitor counters, and a dynamic management view, sys.dm_db_file_space_usage, that keeps track of how much space in tempdb is being used for each of the different kind of object stored in tempdb.

I understand that this is the first time you wrote with a team of other writers; can you tell us something about that experience?
I initially thought that not having to write the whole volume by myself meant that I could get it done sooner, but that was not the case. Everyone had their own schedule and their own way of writing. The personal aspect of working with the other authors was great. I deeply respect all of the others and it was an honor to be working so closely with them. I had some concerns about the depth of coverage and I wondered whether all the chapters would end up being as deep as I hoped for, but that turned out not to be a major problem. The only real issues were agreeing on a common terminology and coding style, and even that wasn’t that big of an issue, because I got to do a final editing pass on everyone’s chapters.

What SQL Server books are on your bookshelf?
All of the Inside SQL Server books are there, of course, and all of Ken Henderson’s books. Bob Beauchemin’s book is in my car, to read while I am waiting for the ferryboat, and while on the ferry. I also have technical books that aren’t SQL Server specific, such as Jim Gray’s Transaction Processing, Russinovich’s and Solomon’s Windows Internals, Chris Date’s Introduction to Database Systems and Mike Stonebraker’s Readings in Database Systems.

Why do you write technical books?
I love working with SQL Server and trying to find out all I can about it. When I found that I could explain difficult concepts in a training environment, I thought I could do the same thing in a written format, and reach more people that way. I have always loved explaining things, ever since I was a teaching assistant for High School Math.

Will you be updating your books for SQL Server 2008?
I have just started meeting with my editors at Microsoft Press about SQL Server 2008, and it looks like a revision is in the plans. We’re really looking at it as just a revision, with the same structures as the current books, with straightforward changes and the inclusion of new features.

Name three things that are new in SQL Server 2005 that you find are the most valuable?
Dynamic Management Views, Dynamic Management Views, and Dynamic Management Views!
Oh, you wanted three different things? ;-) How about XML query plans and optimization hints. (I’m also very fond of many of the new TSQL constructs, but I was only talking about things that I cover in my new book.)

Name three things which are coming in SQL Server 2008 that you are most excited about?
You’ll have to ask me this next time. I have actually been avoiding SQL Server 2008 while I was getting my Query Tuning and Optimization book finished, because I didn’t want to get distracted.

Can you list any third party tools that you find useful to have as a SQL Server developer/admin?
I’ve tried a few other products, but usefully I find that it is much easier to just stick with the Microsoft line and use the tools provided with the product.

Name some of your favorite non-technical books.
Oh, I love to read. It would be impossible to list my favorite books, but I can tell you my favorite authors, most of whom write science fiction: Lois McMaster Bujold, Ursula LeGuin, Sheri Tepper, Orson Scott Card, Octavia Butler, Elizabeth Moon. I also love to read historical fiction like Leon Uris.

Monday, August 06, 2007

Becoming A Better Programmer In 6 Months: The First 20 days

Here is an update of what I accomplished in the first 20 days

Read the book lifehacker
Read the book Microsoft SQL Server 2005 Integration Services
Read the book Extending SSIS 2005 with Script
Read 1 chapter of Learning Python, Second Edition
Played around with the July CTP of SQL Server 2008

So in the first 20 days I have read 3 books however two books are very thin. I will need that time later when I start on much thicker books like Code Complete and Inside Microsoft SQL Server 2005: T-SQL Querying

I will also make a small change to the list instead of Expert SQL Server 2005 Integration Services I will read Core Python Programming

I also started tinkering with Python, those guys are a bunch of jokers. if you type "import this" in a Python command line window you get this output

win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
>>>

So that is one of the easter eggs hidden in Python.

In the past week I also played around with the new date data types in SQl server 2008, I have filed a bug/typo which I found in Books On Line.This week I will concentrate on the book Learning Python, Second Edition during weekends and lunch hours, in the evening I will read Practices of an Agile Developer

This is it for the update. The original post can be found here: http://sqlservercode.blogspot.com/2007/07/become-better-developer-in-6-months.html

A more detailed post about the first 10 days can be found here: http://sqlservercode.blogspot.com/2007/07/becoming-better-programmer-in-6-months.html

I am also glad to say that most of the people I tagged in the original post have responded

Wednesday, July 25, 2007

Becoming A Better Programmer In 6 Months: The First 10 days

Here is an update of what I accomplished in the first 10 days

Read the book lifehacker
Read the book Microsoft SQL Server 2005 Integration Services
Read 126 pages of Extending SSIS 2005 with Script
Installed PostgreSQL, Python, Eclipse and Django.

Now you may ask yourself how I could have read all these things in 10 days. This is because I have to convert a whole bunch of packages from DTS to SSIS. So I did read a lot at work about SSIS. As you can see I sneaked the Extending SSIS 2005 with Script book in there which was not on my original list. I actually did all the example in that book. SSIS is pretty cool, the only thing which was frustrating (at first) was that you cannot modify a connection string with script like in DTS. However you can use Package Configurations to do that. This is important if you have to import a daily Excel file with a different filename every day. So as your first step in your package you just update the configuration table. Here is a small example

DECLARE @i char(8)
SELECT @i = CONVERT(CHAR(8),GETDATE()-1,112)

UPDATE dbo.[SSIS_Configurations]
SET ConfiguredValue = 'E:\SSISExcel\ida' + @i + '.csv'
WHERE ConfigurationFilter ='CSV'
AND PackagePath ='\Package.Connections[FlatFileCSV].Properties[ConnectionString]'

I will write a blogpost with more details and screenshots within the next couple of days.

I though the Microsoft SQL Server 2005 Integration Services book was pretty good. I saw some mixed reviews on Amazon but I do not agree with that at all. The book is well organized, easy to read and the examples are easy to follow. I recommend this book to anyone who has to learn SSIS.

Another book I read is lifehacker, this books shows hacks that you can use to improve your technical life. One of the hacks that I have implemented is the JunkDraw hack. You create a folder called JunkDraw, this is where you save all your downloaded content. Then there is the VB Script which is scheduled to run once a day and deletes all the files which are older than 2 weeks from this folder. So if you downloaded something and you did not move it from the folder it will be gone. How many files/apps/trial/beta apps have you downloaded, moved to a folder and never looked at again? Exactly this will prevent that kind of clutter.

I mentioned that I would like to learn a new language, so I went a little overboard because in addition to a new language I have also chosen a new database and a framework. The language is Python which was created by Guido van Rossum. Python is a scripting language and pretty popular among the FLOSS guys/girls. This of course will prepare me to play around with IronPython and the DLR once that is finalized. The DB I picked is PostgreSQL, I have chosen PostgreSQL instead of MySQL because I just can’t install a DB where you can enter invalid days. Another reason is that PostgreSQL is recommended with the framework that I picked. I picked Django over TurboGears and Ruby on Rails because I have heard some good things about it, one of them being performance. So last Sunday 5AM I installed PostgreSQL, Django, Python, Eclipse and the Eclipse Python plugin Pydev on a windows box and got the initial setup to work.

I will keep you posted on my progress once every 10 days but so far it is going good ;-)

Here is the link to the original Become a Better Developer... in 6 months article

Monday, July 16, 2007

Become a Better Developer... in 6 months

I just listened to the latest Hanselminutes podcast: Be a Better Developer in Six Months
Scott Hanselman asks “what are you going to do in the next 6 months to become a better developer”?
He suggest reading books, nerd dinners, having lunches together with other non competitive companies, watch webcasts together during lunch and code reading.

Here is what I am going to do for the next 6 months

I am going to read a technical book every 10 days and review every single book

That should be possible now that my twins are one year old (tomorrow). I have a bit more free time at night to read. Here is the list of books, some of them I have read, some I have partially read.

Code Complete (reread)
I think this is one of those books that you should read once a year.

Practices of an Agile Developer
Some good stuff in here, in ordered it a couple of months ago but did not read it yet.

Inside Microsoft SQL Server 2005: T-SQL Querying (partial reread)
I read several chapters but did not read the whole book.

Inside Microsoft SQL Server 2005: The Storage Engine (reread)
I have read parts of this one; I have read the 2000 edition several times.

Refactoring (reread)
I was thinking Design Patterns (GOF) or this one. As you can see I have chosen Refactoring.

Prefactoring
Why refactor when you can prefactor? I just skimmed through it in the book store and it looks promising.

Open Sources 2.0
Open Sources 2.0 is a collection of insightful and thought-provoking essays from today's technology leaders that continues painting the evolutionary picture that developed in the 1999 book Open Sources: Voices from the Revolution.

Pragmatic Unit Testing in C# with NUnit
New edition.

Building the Data Warehouse (reread)
Read this one several years ago, will read it again

Expert SQL Server 2005 Integration Services
Will read this together with the one below at work; have to convert about 60 DTS packages to SSIS.

Microsoft SQL Server 2005 Integration Services

Beautiful Code
In this unique and insightful book, leading computer scientists offer case studies that reveal how they found unusual, carefully designed solutions to high-profile projects. You will be able to look over the shoulder of major coding and design experts to see problems through their eyes.

Pro SQL Server 2005 Database Design and Optimization (reread)
Read this will read it again

The Art of SQL
Heard some good stuff about this book.

Getting Things Done
We all need some help with organizing our lives.

Lifehacker (reread)
Getting ThingsDone for the computer person, very useful stuff inside.

Framework Design Guidelines (reread)
Very nice book, you will learn why something was done a certain way. Good tips on what to avoid and what should be done.

New language Book probably Python or Ruby( you decide)

Here is a pic of the books I have at home, the others I have at work or I still have to purchase them.

I will watch 2 web casts a week during lunch time and review those also.

I will look at high quality source code from open source projects and also from the book Beautiful Code. I will go to CodePlex to download a couple of open source projects and will study the source code

I will learn a new language (I actually got this from Ken Henderson who suggests to learn a new language every year) and rewrite one of the current applications in that language. This way I don’t have to worry about logic problems and design, I just have to translate the code.

I will learn a new technology. I am thinking either WCF or WPF

I will keep updates on Pownce (sorry folks no invites left) everyday The reason I am doing this is so that someone can call me out in case I don’t keep my promise. This is similar to stopping smoking but not telling anyone, if you do that then who knows you stopped so that they can confront you?

I know this is cheesy but I will do it anyway, I will tag 5 people I (kind of) know and I want them to tell us their plans.
Adam Machanic
Louis Davisdson
Peter DeBetta
Mladen Prajdic
Hugo Kornelis

And I will tag 5 people whose blogs I read but I don’t know them
Jeff Smith
Jason Gaylord
Jeff Altwood
Matija Lah
Ward Pond

And you the reader, what will you do in the next 6 months to become a better developer?

Cross posted from here: http://sqlblog.com/blogs/denis_gobo/archive/2007/07/16/1746.aspx

Monday, June 18, 2007

Which SQL Server 2005 Analysis Services Book Should I Get?

Who can recommend a good SQL Server 2005 Analysis Services book?

I know SQL Server 2000 Analysis Services but did not work with SQL Server 2005 Analysis Services yet. As a matter of fact I haven’t touched Analysis Services in 2 years. I don’t need a book which explains what a start or join schema is, I know what a slowly changing dimension is, I also know the difference between a fact table, a dimension table and MOLAP/ROLAP/HOLAP.

The 2 books that I used previously are the WROX book (Professional SQL Server 2000 Data Warehousing with Analysis Services) and the MS Press Step by Step book. I remember liking both of them, are their successors as good?
The problem with the reviews on Amazon is that it doesn’t match my expectations; the WROX 2000 book only got 2.5 stars which I think is way too low. While the book is not perfect it deserves more than 2.5 stars.

Here are the 5 books I am considering

Microsoft SQL Server(TM) 2005 Analysis Services Step by Step

Delivering Business Intelligence with Microsoft SQL Server 2005

Professional SQL Server Analysis Services 2005 with MDX

Microsoft SQL Server 2005 Analysis Services

The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset

BTW I don’t need the book right now, I won’t actually start working with this until next year. If you know of a book that is coming out between now and January please let me know also.

Thursday, June 14, 2007

Book Review: Expert SQL Server 2005 Development By Adam Machanic

If you are an advanced or intermediate SQL Server developer then this is the book for you. Adam understands real world scenarios and understands that databases are part of a bigger group in the business world. The database is usually the most important asset in an organization. All your data is in the database, you need to secure it, this is where encryption, privilege and authorization comes in. The ratio of web servers to database servers is usually many to one, it is easy to scale out with web servers however with database servers this is not so easy. This is a reason why your code needs to be optimized and designed for application concurrency.

I recommend this book to any intermediate or advanced SQL Server developer. This book is not a book that is like the other book you have but 2 chapters are different. NO, this book contains a lot of good info which is not available in other books. I learned a lot from reading this book and you will too. Here is the breakdown of what is covered in the chapters.

Chapter 1 Software Development Methodologies for the Database World
Adam explains what Cohesion, Coupling and Encapsulation is, where the business logic should live and the balance between maintainability, performance, security and more.

Chapter 2 Testing Database Routines
This chapter is worth the price of the book by itself. You will learn how to unit test your procedures, evaluate performance counters and this chapter introduces the SQLQueryStress Performance Tool (see picture below) which will be used in other chapters. This is a very useful tool if you have to tune a query. How many times do you set statistics time and statistics IO on and off to see the reads and CPU time? This tool does it all for you, paste in your query or proc call, specify how many times you want to run it that is it. This tool will save you many stressful (pun intended) hours

Chapter 3 Errors and Exceptions
This chapter explains the different type of exceptions and how to do error handling. You will also find out what a ‘doomed transaction’ is, this is the one where you get this user friendly message: “The current transaction cannot be commited and cannot support operations that write to the log file. Roll back the transaction.”

Chapter 4 Privilege and Authorization
This chapter explains what impersonation and ownership chaining is. Also covered is how to use EXECUTE AS and how to sign procedures.

Chapter 5 Encryption
This chapter will explain encryption to you in a clear and concise matter. You will learn how to improve performance by using Message Authentication Code. The difference between symmetric and asymmetric key encryption is covered as well as all the terminology that is needed to really understand encryption.

Chapter 6 SQLCLR: Architecture and Design Considerations
What this chapter covers is SQLCLR security, why to use SQLCLR and how to enhance Service Broker Scale-Out with SQLCLR

Chapter 7 Dynamic T-SQL
You want to protect your data? Then this is something you have to read. You will learn how to deal with sql injection, why sp_executesql is much better than exec and the performance implications of parameterization and caching.

Chapter 8 Designing Systems for Application Concurrency
If you are running an OLTP system and you are suffering from blocking/locking then this is the chapter for you. Isolation levels and how they affect concurrency is explained. This chapter uses the SQLQueryStress Performance Tool to show you the difference it makes in performance when you slightly change your proc.

Chapter 9 Working with Spatial Data
Spatial data, this is what a lot of people are storing these dates, unfortunately calculating the distance between 2 points is not as easy as it seems (the earth is not flat you know ;-( ) This chapter covers a couple of ways to represent Geospatial Data.

Chapter 10 Working with Temporal Data
Dates are everywhere in the database but unfortunately a lot of people do not know how dates are stored internally and how to write efficient queries which will cause an index seek instead of a scan. Calendar tables, time zones and intervals are all covered in this chapter

Chapter 11 Trees, Hierarchies, and Graphs
The difference between Nested Set Model, Persisting Materialized Paths and Adjacency list Hierarchies are explained. There is code included that shows you how to traverse up or down the hierarchy, insert new nodes and much more.

Amazon Link: Expert SQL Server 2005 Development

I have also interviewed Adam Machanic a while back, you can find that here: Interview with Adam Machanic Author Of Expert SQL Server 2005 Development

Wednesday, May 23, 2007

Book Review: Pro SQL Server 2005 Database Design and Optimization

Pro SQL Server 2005 Database Design and Optimization
By Louis Davidson, Kevin Kline, Kurt Windisch
ISBN: 1590595297
672 pp.
Published: May 2006

Before I start with the book review here is a little story that took place in 1995. A former colleague wanted to open a bar in the East Village (New York City). He looked and looked and found this place somewhere on 9th street. He looked around the place and liked what he saw. Then he went to the back and saw about seven wooden beams against the wall. When he asked about them the owner said that is to keep the wall from collapsing. Anyway my friend settled on another place and opened his bar in a former parking garage, the name of the place is Liquids and it is located on east 10 Street between First Avenue and Avenue A.

Now why am I telling you this story? This story happens in the database world all the time, think of the data model as the foundation of your house if your foundation is weak you need kludges (wooden beams) to prevent your structure from collapsing. How many times do you see on newsgroups questions relating to split string because someone stored a comma delimited string in a column instead of having a lookup table? You have to start with the proper design, once you have that everything else becomes so much easier. It is easy to change a proc or some code somewhere. Once you have to redo your design then you have to also go through all the code and make changes and then also pray that you didn’t break anything. So this is what this book is all about; to teach you how to properly design a data model for speed, concurrency, security, interoperability, integrity and much more. There is no need for triggers when you can use check constraints which in turn will perform much better.

I got my copy of the book the day after our twins were born, that is a little more than 10 months ago. I have read the book many times but I did not read it in the order that it is written in. However I do recommend reading at least the first 5 chapters in the correct order, this is especially true if you are new to databases.

So what do I think of the book? I think that this book is excellent, it is written in a way that is not dry, and you will actually laugh a couple of times when reading this book. The book is written in such a way that it goes into as much detail as necessary to explain the concept and also provides links where needed for material that is not in the scope of the book. Most chapters end with a best practices and a summary section which is very useful if you want to find out what the chapter is about without reading the whole chapter first.

The book has 11 Chapter and 2 appendixes which are listed below

Chapter 1 Introduction to Database Concepts
Chapter 2 Data Modeling
Chapter 3 Conceptual Data Modeling
Chapter 4 The Normalization Process
Chapter 5 Implementing the Base Table Structures
Chapter 6 Protecting the Integrity of Your Data
Chapter 7 Securing Access to the Data
Chapter 8 Table Structures and Indexing
Chapter 9 Coding for Concurrency
Chapter 10 Code-Level Architectural Decisions
Chapter 11 Database Interoperability
APPENDIX A Codd’s 12 Rules for an RDBMS
APPENDIX B Datatype Reference
INDEX

The name of the chapter indicates what the chapter is about, I won’t go into detail by reviewing every chapter.

If you are completely new to SQL then this is the book for you because a good design is the foundation of everything. Now there are two ways to find this out the first way is the easy way. You purchase the book, read it and use the code. The second way is more painful. You have a weak database foundation, performance is bad, customers complain, your boss is breathing down your neck and the only shows you watch on TV are the late shows because you have to update statistics and reindex your database between 7PM and 10PM every single day to make it perform somewhat responsive. If you are an advanced/intermediate developer don’t think that this book is not for you. It is! I have learned a bunch of new tricks/tips which I would not have if I did not read the book. Obviously I am not the only one who praises the book; just take a look at the reviews on Amazon

If you want check out the book first here is the link to the sample chapter: http://www.apress.com/book/supplementDownload.html?bID=10005&sID=3490

Here is the link to Louis’ blog where he posts updates about the book http://drsql.spaces.live.com

I did an interview with Louis a while back, you can find that here

The nest 2 reviews will be Itzik’s and then Adam’s book. I think that Itzik’s will be next week and Adam’s the week after that.

Wednesday, May 16, 2007

Interview with Adam Machanic Author Of Expert SQL Server 2005 Development

I am a big fan of Adam Machanic and the way he thinks about SQL server development. How may SQL developers unit test their procedures or use assertions in their code? Well I can guarantee that there are very few of those. I was very excited when Adam Machanic contacted me to give me a copy of Expert SQL Server 2005 Development for review. I did not finish reading the whole book yet but what I have read so far is very good. If you are an advanced developer and want to kick it up a notch then this is the book for you.

The question-and-answer session with Adam that follows was conducted via email.

What is the audience for this book? Is the book geared towards a beginner/intermediate level user or do you have to be an advanced user to really utilize the information in this book?
The book is certainly not for beginners; I make a lot of assumptions about the level of knowledge possessed by the reader. The reader should understand the basics of writing T-SQL, creating indexes, reading query plans, etc. It's difficult to differentiate between intermediate and advanced users, as everyone has their own opinion of what those levels actually mean! I've covered several not-so-common topics, and my hope is that that almost any SQL Server practitioner who has advanced beyond the basics can learn something from the book.

What are the most important things a person can do to master Transact-SQL?
Practice, practice, and more practice! The more complex queries you write, and the greater the variety of problems you solve in T-SQL, the easier it gets. Just like other programming languages, T-SQL programmers use and reuse various patterns as they work on different projects and problems. Gaining skill in T-SQL is really nothing more than a process of learning when and how to properly apply the patterns to any given situation.

Adam, the first time I came across your name was in the Microsoft public SQL Server programming group. Can you explain to us why you participate in newsgroups and forums?
When I first started working with SQL Server and other Microsoft technologies I discovered Microsoft's news server and was shocked to discover the overall quality and high signal-to-noise ratio compared to many other (Usenet) newsgroups I'd previously frequented. Questions I asked were actually answered--by people who knew what they were talking about, no less! After spending quite some time asking my own questions I realized I could answer a few here and there too, and give something back to the community that had helped me so much. Over time I became hooked. What a great way to kill free time (especially while waiting for software to compile or long queries to run)--not only does it feel good to help others solve their problems, but it also helps to strengthen my skills as a developer. As I mentioned in the previous question, the most important key to software development greatness is practice. By answering newsgroup and forum posts, you can get exposure to all sorts of business problems and scenarios that you would otherwise never see. There is no better learning experience that I have found.

Why is it important to think in sets instead of procedurally?
It's often said that thinking in sets takes advantage of the declarative nature of SQL; and while this is certainly true I think it's important to drill down to why that's such a big deal. SQL query performance, generally speaking, comes down to I/O performance. Too many I/Os, or an I/O-bound system, and your query won't return fast enough--simple as that. When we write set-based queries, we allow the query engine to optimize in order to minimize the number of index requests--and, therefore, I/O operations--necessary to satisfy the query. Working procedurally, with loops, means that the opposite happens--the maximum number of index operations will happen on every invocation, and therefore the query will perform in constant time—at the worst possible level.

Why do you feel it is important to properly unit test procedures?
I cover unit testing in some detail in Chapter 2 of the book, including when and why you want to do it. But what it boils down to is that creating a solid unit test infrastructure helps to guarantee that as you change your code over time, you won't break anything--and if you do, you'll find out about it very quickly. Studies have shown that by far the most expensive part of the software development process is maintenance. This is due to the fact that as we change our software, we break other pieces (dependencies)--and oftentimes, those dependency breaks silently cause subtle issues, that aren't noticed until long after the change was actually made. This can make it extremely difficult to track down and fix the root causes of problems. Comprehensive unit testing makes it possible to eliminate a lot of that delay, by alerting the developer right away if something doesn't look good. This, in turn, results in better quality software, happier customers, more business, and as a result, bigger bonus checks. A win-win situation, if you ask me.

Which chapter was the hardest to write and can you explain why?
Chapter 10, "Working with Temporal Data," took me four months to write. It's my favorite chapter in the book, and one of my favorite topics to think about, but it covers a huge variety of scenarios. As I mention in the chapter, temporal data finds its way into every single database, so it's a large and far-reaching topic. Attempting to cover it in a fairly complete manner turned out to be quite a big task!

Why do you write technical books?
Mainly for the groupies. But aside from that obvious perk, I do it simply because I love sharing technology and techniques with others. Writing a book is similar to participating in a newsgroup or technical forum, but on a much deeper level. Whereas I might answer a question on a forum in 400 words, or write a 3,000-word article, in a book I can cover the same topic in 15,000 or more words--really guide the reader through all of the pitfalls and ins and outs of a given area, and do things that would be impossible in another media (except, perhaps, in a long series of blog posts or articles). Plus, let's be honest--it's just a real thrill seeing my name on the cover of a book!

Who are your favorite authors?
For technical titles, my favorites are Steve McConnell ("Code Complete," "Rapid Development"), Joe Celko ("SQL For Smarties" and others), and Andrew Troelsen ("Pro C# and the .NET Platform"). These authors manage to convey heavy technical topics in very readable formats--which is something I aspire to doing in my own writing.

Unfortunately, I haven't been reading much fiction recently. The last really great novel I read was "Cloud Atlas" by David Mitchell. I'm also a longtime fan of science fiction writers such as William Gibson ("Neuromancer") and Neil Stephenson ("Snow Crash," "Cryptonomicon").

I noticed you have a chapter that deals with application concurrency, do you feel that developers don't have a multi-user/multi-app mindset when first developing the data model/procedures?
With the recent trend towards multiprocessor and multicore machines, threading and synchronization have been hot topics. I've read lots of interviews and articles about how many developers don't understand multithreading well enough. Yet as database developers, I feel like we've been there for years--the database is almost always a shared resource, and managing synchronization is no less important there than in the application tier. A lot of the time we tend to think about concurrency merely in the context of isolation levels and the like, but there is a lot more to it than that--an application will often push much of its synchronization logic all the way into the data tier (whether on purpose or not), and how to properly handle things is really more of a business question than a technical one. When writing that chapter, I tried to frame the issues in terms of business requirements, and then show how to use SQL Server to satisfy them. I think that developers are sometimes prone to getting too excited about technology and trying to find a business problem to which they can apply some new technique--the whole "hammer looking for a nail" mentality--rather than the correct route of determining the business problem and then using technology to solve it.

What SQL Server books are on your bookshelf?
I've now gotten rid of most of my SQL Server 2000 books, except for Ken Henderson's Guru's Guide" series and Kalen Delaney's "Inside SQL Server 2000". My bookshelves (note the plural!) are filled with SQL Server 2005 books--for the first year or so after its release I managed to get almost every new book that was released. I eventually slowed down a bit, but it's hard--there is a lot of great material out there! I'm a really big fan of the "Rational Guides" -- short, concise, yet in-depth guides to just one topic, generally very well-written and easy to digest.

Why did you decide to include a chapter on spatial data?
Through nothing more than luck of the draw I ended up on a couple of projects involving heavy spatial queries, and found that it was an area I really enjoyed. I love to solve difficult problems, and since comparatively very few people are doing spatial in SQL Server, I found that there was a lot of room to innovate--so I ended up having a great time with these projects and totally got hooked. I feel that spatial data is an area we're going to see a lot more of in the future. Spatially-enabled devices are becoming more and more prevalent, and of course all of those devices are going to need data. Unfortunately, due to time constraints I was not able to write that chapter, so my friend Hugo Kornelis tackled it. Hugo did an absolutely fantastic job of covering the topic--I think it's a great chapter, and will help readers get right up to speed.

Are you still making your own beer?
I live in a fairly small apartment at the moment and am somewhat cramped for space as well as time, so I haven't brewed a batch beer for the last couple of years. However, I have been experimenting with ciders quite a bit, and have made a variety of batches using all sorts of adjuncts including honey, brown sugar, and malt extract. Cider is very forgiving and needs to sit around for a long time before it's drinkable, so it's easy to make in a small space with limited time. I've also made a few batches of ginger beer, using a recipe that we actually designed in an online SQL Server forum!
(http://www.simple-talk.com/community/forums/thread/1232.aspx )

Table Of Contents
CHAPTER 1 Software Development Methodologies for the Database World
CHAPTER 2 Testing Database Routines
CHAPTER 3 Errors and Exceptions
CHAPTER 4 Privilege and Authorization
CHAPTER 5 Encryption
CHAPTER 6 SQLCLR: Architecture and Design Considerations
CHAPTER 7 Dynamic T-SQL
CHAPTER 8 Designing Systems for Application Concurrency
CHAPTER 9 Working with Spatial Data
CHAPTER 10 Working with Temporal Data
CHAPTER 11 Trees, Hierarchies, and Graphs
INDEX

Sample: Chapter: Ch. 01 - Software Development Methodologies for the Database World
Amazon Link: Expert SQL Server 2005 Development

Wednesday, November 29, 2006

Interview With SQL Server MVP Louis Davidson: Author Of Pro SQL Server 2005 Database Design and Optimization

After interviewing Ken Henderson I decided to try my luck with Louis Davidson. By now you already know that he said yes ;-). Louis is the author of Pro SQL Server 2005 Database Design and Optimization and a SQL Server MVP. I have a copy of his book and I highly recommend it to anyone who wants to learn about designing and optimizing databases. The question-and-answer session with Louis that follows was conducted via email.

Louis, the first time I came across your name was in the Microsoft public SQL Server programming group. Can you explain to us why you participate in newsgroups and forums?

I have been involved with newsgroups since back in college when I was an avid reader of rec.music.beatles. And when I found the SQL Server groups (back then, before Microsoft’s groups it was comp.databases.ms-sqlserver), I started reading them, and finally giving some feedback. I then spent time in the newsgroups trying to answer people’s questions to help them, and help me extend my knowledge past school and technical books; because the real situations that people get into are more interesting than anything you can get in a textbook.

I quit hanging around for quite a while because the public newsgroups (just got too doggone noisy. I mean come on, there is only so often that one can have certain, “personal” products sold to them.

But about three years ago, I went to a session with Kalen Delaney (who was then, and is now one of the people who I look up to greatly,) who was talking about things to do to promote your books. I started then trying to answer three questions a day (or more) on the Microsoft newsgroups (instead of the public ones) and now on the MSDN Forums (http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=85&SiteID=1).

I don’t know that it helped my book sales in any overwhelmingly large way, but it certainly has helped me in my career. I find that so many things I have learned while helping others end up being useful at my day job. Plus, my philosophy in regards to SQL Server is simple: the more people who like SQL Server the more likely that the product will be around when I retire in 30-40 years.

I have noticed that you have explained the 12 rules of Codd, do you think that it is important to know these rules as a database developer?

Absolutely, if you don’t know why you do something, then it is just too easy to think that it isn’t worth it. And you can’t really bend the rules until you know them. Codd’s rules are the foundation that the RDBMS was built upon, which the begat normalization. Once people get that SQL Server is essentially a tool largely created to manipulate and serve up normalized data (a concept that was built upon the original 12 rules) it actually becomes essential to normalize, not just something that some square geezer is telling you that to do.

Why did you decide to write technical books?

Two reasons:
1. So I would have a reference to refer to when I need to recall something.
2. Because I hate having free time.

Seriously, I had really done quite badly in my senior level database design class in college, but good enough (I think I made a C.) So when I was thrust into the database programmer role, I wanted to learn how to do it right (something my father had not so subtlety driven into my brain years before.) What made things even better was that I had several mentors who believed in doing things right during my first six or seven years of programming, so I learned essentially only good practices, but I still wanted more. I couldn’t find a book on database design that was not written as a text book, so I suggested a book on database design to the editors at WROX, whom I was working for as a tech editor on a Visual Basic/SQL Server book. I expected to tech edit this design book, or possibly write a chapter or two, but before I knew it they had contracted me to write the entire book (Chuck Hawkins wrote a chapter for me, but I wrote the rest) and paid me what seemed like a ton of money (of course, if I average it out to an hourly wage, I averaged far less than minimum wage when all was said and done.)

The process of writing the book was an amazing journey, one because I had never written a paper greater than like 15 pages (my degree is in CS, so all of our papers were really quite small in comparison) and I had to learn a good deal of stuff along the way because I was not really an “expert” at the time (and the more I know, the less I feel like an expert now!) The thing about writing is that you have to be more correct than in almost any medium. When you speak on a subject, if you say something, the people in the audience go, “whu?” and if you talk fast enough and right enough, they will likely forgive you (as long as you don’t do it again.) But if you mess up in writing in a book, it just hangs there over your head forever.

When it was over I had something I was proud of and I said what every technical book writer I have talked with says: “NEVER AGAIN! Too much work, not enough gain, I could make more money cleaning out stalls at the local pet store!!!” I have now said more or less the same thing three times now, and I hope to say it again around the time Katmai ships.

Which chapter was the hardest to write and can you explain why?

Well, the ones that were the hardest were actually the easiest, since I got the Kevin Kline and Kurt Windisch to write them. These guys were excellent co-authors, and did great work. Kurt’s section on CLR programming is great, with some awesome example code that I often use when I need to build a CLR function. Kevin wrote a chapter about cross-platform development that rounded out the book.

Of the chapters I wrote, the most troublesome was probably the security chapter. Security is a very complex topic and there are many, many ways to enforce security. Even worse, there are too many ways to get around it once you have it set up. It is the one chapter that I am not completely happy with, and the one that I plan to do some writing on again in the next few months, most likely after the holidays, during which I hope to do very very little that isn’t concerned with eating, sleeping, or going to Disney World.

What is the audience for this book?

Well, with this being my third book I was kind of hoping that my audience had grown to Dogbert
-like proportions, but I have to be reasonable. I would simply be satisfied if I could get every SQL Server programmer to read it (and buy it!) The overall goal was to make a practical guide going through the boring bits (theory, modeling), the stuff you hear about but rarely really understand (normalization) and to include the more practical topics like constraints, triggers, security, etc that even people who have years of experience would like.

I think that most people will get something from the book, from the newbie to the pro. Of course the payoff will be far less for someone who already knows the theory, but there is something in there for everyone that will at least be worth your 60 bucks (which is the full retail price. Don’t pay full retail price!)

I also try to keep it as light as possible, considering the very dry topic. There are several little humorous bits thrown in, and for fun I added a hidden message for a contest to run later (actually going on right now in the end of 2006.) All of the clues are leading to a message that I “encrypted” into a bit of code in the book (and yes, that is bonus clue for reading through this really long interview!) for a bit of fun.

Ultimately, I want the book to be something that I would purchase in each version at the very least for the practical aspects of the book. You might think that since I wrote the book I wouldn’t refer to it, but I often use the templates for triggers, procedures, error handling, etc from the book whenever I need to write one without using some tool.

Which SQL Server books are on your bookshelf?

Well, to tell the truth, I have quite a few on my desk and no time to read them. I have thumbed through Itzik’s book, “Inside Microsoft SQL Server 2005: T-SQL Programming”, and it is pretty darn good. Dejan Sarka, who was one of my technical editors worked on that book too. Beauchemin and Sullivan’s “A Developer’s Guide to SQL Server 2005” seems pretty good, and I have used Tachev’s book, “Applied Microsoft Analysis Services 2005” several times to try to figure out how different Analysis Services was in the 2005 version.

And I would be remiss to not mention Kalen’s previous “Inside SQL Server 2005” book. What a great reference that book was. I can’t wait to get my hands on her new books (not that I would have the time to read them either!)

What are/could be some consequences if your data model is not in Third Normal Form or even First Normal Form?

The earth could stop spinning on its axis, at least that is what I try to tell people. The more reasonable answer is that you will have nagging, annoying issues with your data that will require more and more management just to keep straight.

Normalization pushes data consistency issues to the foreground and forces you to deal with them. For example, take first normal form. One of the rules of first normal form is that every row is unique. A constant question on newsgroups/forums is: “I have duplicate rows, how do I get rid of them?” Well, go through this following messy set of steps, and then PROTECT YOURSELF USING KEYS!
Another concern of first normal form is scalar attributes. I like to define this as keeping your data broken down to the lowest form you will manipulate using SQL. So if you have to use substring for any common operation, they you likely have an issue. For example take an address, if you use addresses just as the user types them in, then just have a text column, put the address in it. But if you have a mailing system and you have to cleanse the data, and then break the address down into all of its constituent parts instead of going through the process every time you need an address. SQL is great for taking bits and pieces and constructing what you need, but lousy at breaking things down.

For third normal form, the problems are more ominous. If the data in the table isn’t really describing the thing that the table is modeling, you will end up with redundant data. Redundant data doesn’t sound so bad, until you have need to update the data. Consider the case where you have a table modeling a person, and in this table you have the person’s manager, the manager’s name and address. From this sentence, it almost sounds plausible and correct. It certainly is something you might want to see on a screen. But is the manager just the manager for the one person? If not you might have ten copies of the managers information. Now you have to make sure and deal with all ten copies, which is a real pain, to say the least.

How did the massive changes between SQL Server 2000 and SQL Server 2005 affect the research for your book?

The beauty for me was that the truly massive changes made little difference. Since the book I have written is about database design, things haven’t changed so incredibly much from that angle in any version of SQL Server. SQL evolves and gets better and there are optimization changes, but the basics remain the same. Fortunately as machines get more powerful we need to tweak performance less and less and we can just focus on data integrity and usability through normalization.

The biggest differences that I am interested in from an authoring standpoint have been in the ability to do more with T-SQL (I wrote the T-SQL for Developer’s chapter in the Pro SQL Server 2005 book for Apress), and now using the CLR functions. I had Kurt Windisch (a friend who was a board member for PASS at the time) write a very large CLR section, and I am so glad. Now I have a nice reference and excellent code samples for CLR objects. It is one of the great things about a book like this in that it is not overview book where you simply give the new features and come up with some reason you might want to use the features. If the feature isn’t tremendously useful, we can say that it isn’t useful. I think we got it pretty much right by saying that functions are really the most useful part of the new CLR objects, as well as to a lesser extent aggregations.

I admit that I don’t cover XML, something that I lament occasionally. I had waffled on including it for months because I just don’t feel that it is that important to OLTP systems, but on the other hand it could certainly be useful to implement a set of sparse properties (a topic I do touch on) and had arranged for another coauthor set up to write me a chapter about XML, but they dropped out too late to get another one. Even today I am considering commissioning an “add-on” chapter about XML to send out to readers (if you want it, vote with your email to drsql@hotmail.com and your wallets by purchasing the book so I can afford to pay someone. )

Name three things that are new in SQL Server 2005 that you find are the most valuable?

Dynamic Management Objects. Wow, I can’t even begin to start on just how great these are. The best one to me is outlined in this blog entry: http://drsql.spaces.live.com/blog/cns!80677FB08B3162E4!1443.entry . Automatic index tuning is awesome. I have used the basis of this system to optimize a third party system in excellent ways using the number two feature.
Included columns on indexes. These have really made it easier for dba’s to optimize queries by building covering indexes for queries without bloating indexes to death. What’s more, because you aren’t limited to 900 bytes, you can cover some queries in larger, more unwieldy systems that you just couldn’t do before. They aren’t the end all as they still have to be maintained, but they are certainly excellent in many situations. We have used them extensively to optimize a third party system that was very heavy on reads to cut down tremendously on full table scans.
The new BI Paradigm. The attribute oriented nature of Analysis Services is just plain cool. It gives users this extremely rich set of possibilities for viewing data using their cubes that is just so much better than they had in 2000. It was a tremendous change in gears though, and I am very glad that the fundamentals of dimensional modeling didn’t change, so the Kimball paradigm data warehouse we had was still quite viable using either platform, just easier to use.

Name three things that you wish would be in Katmai (the next version of SQL Server)?

I should have finished this interview a few weeks earlier, and I could have spoken a bit more freely about this subject, but at this point I am under NDA for what I have heard about Katmai. What I will say is that I hope to see as much work done in the relational arena (T-SQL and the engine) as they do for the other areas like XML, or tools. The heart and soul of SQL Server should always be right there, lurking in the name: SQL. All of the other new stuff they have added over the years has made things better for us, but I would have traded many, many non relational features to have had the ROW_NUMBER() and RANK() functions in an earlier version.

Of course, this is just my very selfish view on things as a Data Architect/Relational Programmer. In this particular case, even more than normal, I particularly speak only for me.

How critical is a proper data model for a good performing database?

Freaking essential! This is like asking how important is it to know the race route to win the race. The SQL Server engine is amazingly good at serving up data, but it is up to the implementer to know what data is needed and to put the data into a format that works the way that SQL Server works.

I would also be remiss if I didn’t take this one step further. How can you build anything without design? Not one programmer would call up a contractor and say “build me a house, three bedrooms, two baths” and then just let the contractor sketch out a rough drawing and then start working would they? And while I realize that it seems that software is a lot more pliable than building a house, once you have built the foundation classes, the database, and started a UI, rework can be just as costly as running a bulldozer though a regular house, and on a particularly a large software project, far more expensive.

What are some of the biggest mistakes that people make when designing a database?

I did something this for Greg Low’s really cool SQL Down Under podcast (where you get a good idea that I am terrified of speaking publicly) but lets just boil it down to two here:

Treating database design like application design - The reason why the concept of a distinct business layer is so intriguing to me is that we can transform the user’s views/needs from the GUI through this layer to the data layer. Each layer of the application has it’s own needs (UI, Object Layer Representation, Data Access Layer, Stored Procedure Layer, Database Structures) each very important in it’s own right. More importantly, each has very different design goals.
Forgetting about the future – During the design phase of any project, changes are very, very cheap. So if you realize you want to have a table with a domain of values instead of a simple varchar column, it is as simple as adding a rectangle to your model and connecting rectangles. Same goes with the discovery that the cardinality between two of your tables was not what you originally thought (“oh, a person is allowed to have more than one phone number in this system.”) However, as time passes, and more code is written, and more data is added to the system, the costs go up almost exponentially. Of course when the project manager has saddled you up and is giving you the crop, demanding results (that is their job after all) it can be all too easy to simply ignore the future and start coding. Always think about Future-You and how annoyed he is going to be when he realizes that had you just spent another hour three months ago, two weeks of work (of course corresponding to Future-You’s vacation to Disney World) could have been avoided.

There are so many others, bad naming, poor constraint usage, under-using stored procedures, ignoring security, weak testing, and so on and so on. But I think I can safely say that if you start with these two concepts, a great number of these bad habits will correct themselves.

Can you list any third party tools that you find useful to have as a SQL Server developer/admin?

I have become quite enamored with the Red-Gate tools, especially the database compare tool. I use it quite often to compare databases to find changes that have been made, especially for indexes and stuff done for tuning a database. The Apex tools are pretty good too.

The one tool I have used a lot lately has been the Automatic Indexing stuff from the query optimization team: (http://blogs.msdn.com/queryoptteam/archive/2006/06/01/613516.aspx). I am working right now on optimizing a homegrown/third party “object oriented”-esque tool that has a great deal of object abstraction built in, and have found tremendous performance gains from some really esoteric indexes this thing added to some of the queries that I would have never thought of in a month of Sundays. I wrote about it in more detail here: (http://drsql.spaces.live.com/blog/cns!80677FB08B3162E4!1443.entry and yes, I know this is the second time I have mentioned it, it is that important!)

In general, I am not so much a tool guy as I am a script guy. I use the UI to script out objects quite often, but then I try to edit everything with T-SQL so I know how it is done. The people who get too comfortable in the wizards or GUI often forget or don’t learn all of the tricks that you can do with the raw SQL commands, because it would just be too hard to build the perfect UI. So they give you 50% in the wizards and 90% in the GUI, and the rest is “advanced” stuff. If you don’t know about the nuts and bolts, you might be tempted to say “SQL Server doesn’t…”.

Lastly I have always been a big fan of ERwin for modeling. It does a really good job of representing my structures, and I like their macro language for building very customized scripts to build tables/views/procedures/etc.

What is the biggest Database in GB/PB that you have ever worked with, how many rows was your biggest table ever?

It was about ten feet across, six feet high. I had just started my new job as a LAN administrator and occasionally I had to fill in for the night operator. We had these 14 inch across reels of something that looked like a reel to reel tape, but they….wait, do you mean gigabytes and petabytes? I was thinking Great Big, or Pretty Big.

Well definitely no petabytes, and only recently have we started thinking in terms of terabytes, some of which is due to the fact that we have started storing everything as Unicode, essentially doubling the size of the database. We probably have around a half of a terabyte in our OLTP system and Data warehouse combined. I am never too excited about the size of a database as I am of the number of rows in a key table. You could have 10 billion rows and a very small database because it was designed nicely, or to the contrary, a million rows and have a terabyte if you just stored all of your data in text blobs (which is a favorite kind of storage for too many people.)

The largest database I designed I never got to see come to fruition because our company went through a couple of downsizing “events”, which I was part of the next to last one. I only know it was large because a prior coworker of mine called me up one day and told me we had gone past the 2 billion row mark and they were to changing to use bigint rather than basic int types for the surrogate keys. Tons of rows, but really they were quite small rows. We were keeping information on Internet equipment that the company had deployed. It was a pretty cool system that I wish I could see in action today (if it is still being used. The concept was neat, but as the implementers we were never 100% sure if the data would be useful later in the process.)

When did you start blogging and why?

When is easy, almost two years ago when I started writing my last book. The answer to why is pretty much the same as the answer to the technical books question, but increased an order of magnitude (as a reminder, that answer was: Two reasons: so I would have a reference to look into when I need it; and because I hate having free time.)

Blogging is possibly the biggest time sink on earth really. When I am working a “normal” 40 or so hour week in my day (read: paying) job, it is easy to spend an hour or so a day blogging, but when my job squeezes my time, it gets really hard.

But the most exciting part of blogging is the incredible active reference it gives me, in a format that really works for me. I can also use more of a shotgun approach and just write about what I want to write about on any topic, SQL or otherwise, rather than just writing about database design exclusively.

Over the years, I have tried creating libraries of code on my laptop, but that never has seemed to work for me. As I have aged (a little) I have started to realize that commenting, blogging, writing, etc is not for you, it is for Future Me and other people. I have so much code on my laptop from the past 10 years that I don’t know what the heck to do with it. I have VB6, SQL from past ages, failed article attempts, diagrams, etc all over my projects directory. So Past Me (that jerk!) is punishing me with his laziness, something I now vow not to do to Future Me. Confused? Good.

I started to realize that my blog had gotten out of control with so much information that I could no longer search and find when I found OfficeLive.com, a Microsoft site that allows you to create a free website for your small business. So I created drsql.org as a reference for my books, presentations, and my blog. On there you will find a decent enough index to my blog entries, usually within a month of being up to date, as well as a list of about ten articles that I can get to from the front page just about instantly.

About the book:
By Louis Davidson, Kevin Kline, Kurt Windisch
ISBN: 1-59059-529-7
672 pp.
Published: May 2006

Table Of Contents
CHAPTER 1 Introduction to Database Concepts
CHAPTER 2 Data Modeling
CHAPTER 3 Conceptual Data Modeling
CHAPTER 4 The Normalization Process
CHAPTER 5 Implementing the Base Table Structures
CHAPTER 6 Protecting the Integrity of Your Data
CHAPTER 7 Securing Access to the Data
CHAPTER 8 Table Structures and Indexing
CHAPTER 9 Coding for Concurrency
CHAPTER 10 Code-Level Architectural Decisions
CHAPTER 11 Database Interoperability
APPENDIX A Codd’s 12 Rules for an RDBMS
APPENDIX B Datatype Reference
INDEX

Sample Chapter:
Ch. 02 - Data Modeling

Amazon Links: