SQL Server Code,Tips and Tricks, Performance Tuning: Interview

Showing posts with label Interview. Show all posts

Thursday, September 18, 2008

Interview With Louis Davidson Author of Pro SQL Server 2008 Relational Database Design and Implementation

I interviewed Louis Davidson in 2005 about the 2005 edition of his book. I did another one today about Pro SQL Server 2008 Relational Database Design and Implementation. You can check out the interview at the link below, you will enjoy it

Interview With Louis Davidson Author of Pro SQL Server 2008 Relational Database Design and Implementation

Wednesday, July 02, 2008

Interview With SSIS Guru Jamie Thomson

As promised yesterday, here is the link to the interview I did with jamie Thomson: Interview With SSIS Guru Jamie Thomson

Tuesday, May 27, 2008

Interview With Erland Sommarskog About SQL Server and Transact SQL

I have interviewed Erland Sommarskog, you can find that interview here: Interview With Erland Sommarskog About SQL Server and Transact SQL

Enjoy

Friday, May 23, 2008

Interview With Craig Freedman About Indexing, Query Plans And Performance

I interviewed Craig Freedman about Indexing, Query Plans And Performance and more. You can find that interview here: Interview With Craig Freedman About Indexing, Query Plans And Performance

Enjoy

Tuesday, May 06, 2008

What Would You Like To Ask Erland Sommarskog?

In the Who do you want to see interviewed next?(sqlblog) and Who do you want to see interviewed next?(here) blog post I asked for some names of people who YOU would like to see interviewed. Erland Sommarskog's name was submitted in comments a couple of times. I contacted Erland and am happy to anounce that he has agreed to do this.

The first time I heard of Erland Sommarskog was in the SQL Server programming newsgroups. I also read all the articles on his site. When I answer questions I usually give the link out to these 4 articles
Arrays and Lists in SQL Server
Implementing Error Handling with Stored Procedures
Error Handling in SQL Server – a Background
and of course The curse and blessings of dynamic SQL

I always wondered why he didn't write a SQL book. Hey, now I can ask him that :-)

Before you think of any questions please visit Erland's page first (http://www.sommarskog.se/index.html) and read some of the articles he wrote.

You can submit questions until Monday May 12th and then I will forward the questions to Erland.

What Would You Like To Ask Craig Freedman?

In the Who do you want to see interviewed next?(sqlblog) and Who do you want to see interviewed next?(here) blog post I asked for some names of people who YOU would like to see interviewed. Craig Freedman's name was submitted in comments a couple of times. I contacted Craig and am happy to anounce that he has agreed to do this.

Before you think of any questions please visit Craig's blog first(http://blogs.msdn.com/craigfr/) to get a feel of what kind of questions to ask. It would make sense to keep the questions focused on query processing, query execution, and query plans. I told Craig that if he doesn't like a question that he does not have to answer it.

You can submit questions until Monday May 12th and then I will forward the questions to Craig

Thursday, May 01, 2008

Who do you want to see interviewed next?

I have interviewed a bunch of people already, some of them on this blog and some of them on SQLBlog

Below is the list of all the people so far

Louis Davidson
Itzik Ben-Gan
Ken Henderson
Kalen Delaney
Adam Machanic
Interview With Stéphane Faroult About Refactoring SQL Applications
Database Refactoring Interview With Scott W. Ambler
Interview With Joe Celko About The Forthcoming Book Thinking In Sets

Who else should I interview? I have some ideas but I would like you to participate. Here is my thinking, after we have a list of possible people, I will contact them and ask if they will agree to an interview. If they agree I will make a post here and ask you what you would like to ask this person. I usually have a good set of questions but there might be someone out there with some real good/interesting questions.
So post the name of the person you would like to see interviewed and hopefully this person will agree.

It would be nice if you did not pick a person who blogs at SQLBlog but if_you_really_wanted_to then I will not stop you :-0

Please keep it database related, it doesn't have to be specific to SQL Server but it has to be applicable to SQL Server.

Monday, April 21, 2008

Interview With Stéphane Faroult About Refactoring SQL Applications

I decided to interview Stéphane Faroult, the author of The Art of SQL because it is one of my favorite SQL books. Stéphane told me he has a new book coming out soon, the name of this book is Refactoring SQL Applications and the majority of this interview is focused on that book.

You can find that interview here: Interview With Stéphane Faroult About Refactoring SQL Applications

Wednesday, April 02, 2008

Database Refactoring Interview With Scott W. Ambler

Since I am doing scrum and other agile methodologies myself I decided to interview the authority on agile in the database world.

Scott W. Ambler is the Practice Leader Agile Development for IBM Rational

You can find more about Scott here: http://www.ibm.com/software/rational/bios/ambler.html

I did not feel like duplicating the content here so you can read the interview here: Database Refactoring Interview With Scott W. Ambler

Monday, December 17, 2007

Interview With Joe Celko About The Forthcoming Book Thinking In Sets

I noticed that Joe Celko has a new book coming out soon: Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL. I decided to send Joe an email to see if he would be willing to answer some questions I had about the book and SQL in general. To my surprise Joe was more than willing to accommodate my request. The question-and-answer session with Joe that follows was conducted via email.

Is the book geared towards a beginner/intermediate level user or do you have to be an advanced user to really utilize the information in this book?
I would say intermediate level. You need to know enough SQL to do some programming in the language and be experienced enough to see that DDL is as important as DML.

What are the most important things a person can do to master SQL?
The most important thing is to make the leap from procedural programming to declarative programming, from sequential files to sets. The only declarative language that most programmers have seen is spreadsheets. They are nothing like SQL!

I assume you wrote this book because people when they first use a database tend to do the same thing they do in Java, C# or other procedural languages; get a bunch of rows and manipulate them one by one. Is this the number one mistake beginners do?
That is right up there in the top five, for sure! But I think that the classic error is in not knowing how to design the schema in the first place. A bad schema -- denormalized, bad data element names, no constraints, no proper keys, no referential integrity -- leads to trying to correct the flaws in DDL in the DML. If you have a good schema, then the queries, inserts, updates, and deletes are usually fairly easy. All of the "hard work" should be hidden in the database and not spread over the application code and DML.

What is so bad about attribute splitting (somehow these three tables come to mind: Squids, Automobiles and Britney Spears)?
NO, no, no! I coined the phrase a "Squids, Automobiles and Britney Spears" table or column to refer to a table or column which has more than one kind of entity or more than one kind of attribute in it. What makes that phrase so fun is that there is a web page which compared photos of a giant squid and Britney Spears after she cut off her hair. If you actually gave such tables or columns a meaningful name, then you would see that it is not a proper table or column. These nightmares would have names like "hat_or_shoe_size_depending_on_column_five" Attribute splitting is when you take an attribute and make it into two or more tables, columns or rows. The table example is the most common one. How often have you seen a table for each location (split on geography) or a table for each month (temporal split)? This mimics a tape file system, so newbies feel comfortable. When you ask them if they would split a Personnel table into MalePersonnel and FemalePersonnel (split on gender), they immediately see the fallacy. Unfortunately, splits need all kinds of code if they want any data integrity. This code usually re-assembles the data elements back to what they should have been in the first place.

In chapter two of your new book you talk about the new advances in hardware. I have noticed that somehow the amount of data I have to store out paces the advance in hardware and the queries don't run any faster. Will Solid-State Disks finally change that?
I have great hopes for solid state devices. Any solid thing that has to physically move is by definition slower than electricity or light. We are looking at nano-tech, better lasers and a ton of new technology almost every month.

I see you have a whole chapter on auxiliary tables, I am a big fan of those tables myself and I use them to create dates on the fly or split comma delimited strings. In your opinion what percentage of developers does not use them and why should you use them?
Not enough. SQL is a language meant to handle data and not to do computations. Auxiliary tables can be shared among sessions and accessed in parallel. Imagine a complicated but deterministic formula. In a procedural computational model, you hang in a loop and re-do the work for each record in a sequential file. Let me give you a real-world example of that. Corky's BBQ does a huge mail-order business at Christmas time. The pick lists need to include the size of the boxes to be used, so part-time help can do their jobs. When I got this problem, they had discovered that using weights was not right way to go. The approach being attempted was to play "3-D Tetris" with the products. Since that kind of thing is a bitch to program, they were getting nowhere. My approach was to look over a few years of shipping history and find out how many different orders they had shipped and what the smallest box use for each of those orders was. There were only about 5000 configurations and the majority were standard gift packages. Do a relational division and a table look-up to handle 99.98% of the cases and print the phrase "Hand pick this order" on the pick list for the exceptions! In the SQL model, you join an auxiliary table which has the parameters and the result value in each row. You can do this simple multi-column equi-join once in parallel. Wait until the multi-core chips make parallelism the only way to design a computer; then auxiliary table are going to really fly.

I noticed you have a big chapter on VIEWs; do you feel that VIEWs are not utilized enough by developers?
They are used either not enough or too much.

Why do you think we should not use bit or Boolean flags in SQL?
In SQL, to be a data type it must allow NULLs. What is the fundamental nature of a BIT? It is one or zero; there is no NULL concept here! This was a problem for SQL Server, when they made their BIT data type into a numeric that could be NULL-able. The change scrambled a lot of data when it was ported from one release to the next. Bits are low-level, hardware dependent concepts. Are you high end or low end hardware? Do you have 4, 8, 16, 32 or 64 bit machine words? You have use proprietary operators. This defeats the idea of machine independence. Finally, bit flags are used to destroy First Normal Form (1NF) and thus destroy data integrity. As an example, recently in as newsgroup someone wanted to use a 4-bit column to store all the possible colors for a product (red, green, blue, yellow) and get them out with a bit mask (his hardware has a nibble!). But how do you add purple? How do you set up a constraint that no item can be mad with both red and green options? In RDBMS, we discover the state of our data with predicates and not by setting flags at the hardware level.

I have your puzzles book and noticed that you have a paragraph on Sudoku and one on Bin Packing in the last chapter of this book. You have procedural solutions and SQL solutions for most of the material in the chapter; is the SQL solution faster?
There is a funny story on the Sudoku problem. Richard Romley is a retired DBA from Smith-Barney and he decided to play with Sudoku for recreation. He used SQL Server on his home machine and coded a solver in one SELECT statement. The procedure takes 81 parameters (the starting grid) and does an 81-way self-join. It produces ALL the valid answers -- bet you did not know that many published puzzles have multiple solutions! The code is straight forward and depends on the optimizer to handle the search condition logic. Even longer problems with tens of answers run in well under one second. The procedural solvers vary, but I have seen some that stop when they get to the first valid grid. If there is only one solution, they are very fast. But is the right answer actually the *set* of valid grids? Since I am an SQL person, I think so. The procedural solvers can get hung up by backtracking to the starting position when there are a few hundred answers and become very slow. I also strongly recommend getting the Japanese or Chinese editions if you read either language. My two translators cleaned up some old code and added new solutions as we went along.

Should having good naming conventions such as 11179-5 be included in database courses?
Drop them in as soon as you start. If you grow up with good conventions, you will start doing it without thinking about it. When I teach RDBMS, I start with scales and measurement theory so that my students know what data is all about -- whether it is in a database or not.

When can we expect your new book to show up in bookstores?
It was supposed to be out in 2008 February, but we lucked up and it will be out in 2008 January. Production was faster than planned. I guess after seven books, and working with the same people, we have it down pretty well!

A bunch of questions not related to the book

Why do you write technical books?
I have no talent for fiction. I cannot get a plot or characters onto paper to save my life and my dialogues are awful chains of "he said-she said" stiff sentences. My grandfather wrote children's poems in Slovak, and I have even less talent for poetry. I have a number of friends who write detective novels, Science Fiction and Fantasy. They don't consider me a real author because I don't do fiction. I think about trying my hand at YA (Young Adult) books -- Danica McKellar (Winnie from the WONDER YEARS television show) just did a math book for girls, so maybe I could do "A Child's Garden of Normal Forms" or a juvenile detective series called "The Hardware Boys", then go on to a television show called "Query Eye for the Database Guy" or something.

Which of your books was the hardest to write and why?
First edition of SQL FOR SMARTIES! It was my first book and I thought that having written a few hundred magazine columns would make it easy. I was dead wrong -- completely different skill set. I was a year late in delivering the manuscript. After that, I had a system in place.

What is coming down the line? Any new books or updates to current ones?
I am trying to do at least one book per year -- more if I am unemployed and need the advances. My current thoughts are a book on the use of Standards in a database, and one on programming tricks with OLAP functions, CTEs, and other new features in the SQL-2003 Standards. My Morgan-Kaufmann books tend to follow a five year cycle, just like the ANSI/ISO Standards. I also get asked by vendors to do product specific books. I might self-publish something completely off-topic. I have a book on domino games based on my postings and I teach Texas-42 on Royal Carri bean Cruises -- it is a domino game only played in Texas. I also have a book on Pai Gow (a gambling game played with Chinese dominoes) which might sell 10 copies. I will also be doing some video classes, but I don't have details yet.

Which of all the SQL books that you wrote is your favorite?
DATA & DATABASES, which never got the sales of the others. It is more philosophical and concerned with the nature of data instead of programming.

What SQL Server books are on your bookshelf?
Anything I can get by Henderson, Machanic, Moreau, Ben Gan and Delaney. The SQL Server experts are pretty well-known and they publish. This is not true for other products, especially the open-source RDBMS products.

Why do you participate in newsgroups and do you think it is a good idea for beginners to ask questions in newsgroups?
To do some shameless self-promotion, of course :) Newsgroups are a good source of SQL problems and some clever answers that I can use in books and when I am consulting. I also have a pedantic streak I did to get out. And if I am available on a newsgroup, people don't fill up my mailbox at home. And, yes, beginners should use newsgroups for help. But not to have someone else do their job or their homework for them. I like to see the mindset of people who are just learning SQL. It is not enough to see that someone is making a mistake; you want to figure out what lead to that particular mistake. Remember Chernobyl? Everyone did just what they were supposed to do, but there were a few critical assumptions that lead to an event cascade.

What are Cowboy Coders and id-iots?
The term "cowboy coder" is an old one. It means someone who starts coding without any design phase, without any overview to the system as a whole, without any research for industry standards or a company data dictionary. They usually love dialect code and tricks that trade immediate performance for maintainability. The heavy dialect code also gives them job security, since they usually only know one product. An "ID-iot" is a newbie who has no RDBMS education and wants to have the comfort of a sequential file system. So he puts an IDENTITY column on every table as the PRIMARY KEY. Never mind that it is proprietary and non-relational; it is the familiar record number from a file system which can use to mimic pointer chains. He does not understand that rows are not records, tables are not files, columns are not fields and references are constraints and not pointers.

I have been working with Sybase IQ for a little bit; what is your opinion on columnar databases?
Sybase IQ is not the only game in town. I consulted with SAND (nee Marcus, nee Nucleus) years ago. It was one of the first such products. Later I ran into WX2 (nee White Cross) and I am looking at Stonebreaker's Verticia now. Their advantages in parallelism and compressing large amounts of data make them the best choice for Data Warehouses. I would also look at Teradata, which uses hashing. That will become more important as the research on minimal perfect hashing functions gets out of the lab and into products.

Where can we expect to see you in 2008? Any conferences, seminars, trade shows or classrooms perhaps?
I will hopefully be doing some more "SQL Saturdays!" on my weekends. I want to do more webcasts, but I am not sure if I am ready for YouTube. My other travel goal for 2008 is to get to Australia or Japan; I have never gone past Hawaii.

Some of Joe Celko's Books:
SQL for Smarties
SQL Programming Style
Trees and Hierarchies in SQL
SQL Puzzles and Answers
Data and Databases

Sunday, December 02, 2007

The Sad State Of Programmers Part 1 : The Phone Interview.

This is going to be a three part series.
Part 1 The phone interview.
Part 2 The face to face interview.
Part 3 Some tips and observations

A while back I posted that we are looking for a SQL/.NET/FoxPro developer. I did this because we had a real hard time finding this person. I am happy to inform you that we did find this person and he will start in two weeks. Interestingly enough we hired the person with the least years of experience (on paper). This person knew more that people with three times his experience in years.

These days when looking for a programmer you have to do phone interviews if you don’t want to waste an incredible amount of time. A phone interview enables you to assess the skill set of a potential employee without wasting time by picking him up, getting a security badge, booking a conference room etc. A phone interview is also good for the candidate since he/she doesn’t have to travel or dress up to do the interview.

Some things are difficult to ask over the phone but if the candidate looks (or should that be sounds) good then you can ask those questions when you bring the person in. Some people will prepare for a phone interview by having all their books and notes in front of them. They will ask you to repeat the question and while you do so you can hear them flipping pages frantically. So you might be able to cheat on the phone interview but be assured that if you do not know your stuff that you will fall flat on your face on a face to face interview (no pun intended).

One thing I never understood is the fact that it takes a person one minute to answer a question. You either know or don’t know the question. Keep your answers concise, do not spend 3 minutes explaining to me what the difference is between a clustered and non clustered index.
I had to reword my questions slightly because when I asked a question like “Do you know what the difference is between a clustered index and a non clustered index?” some people would reply “yes”. Because of that I changed the question to “Describe what the difference is between a clustered index and a non clustered index?”

Do not shoot yourself in the foot by giving me additional information which is wrong. I asked for the fastest way to empty a table. Almost every single person who knew about truncate added that you cannot rollback a truncate statement. I wrote about that myth a couple of months ago: SQL Myth: Truncate Cannot Be Rolled Back Because It Is Not Logged

I tend to ask between 20 and 40 questions, if I see the candidate’s skill is not good enough I don’t ask everything. Some of the questions are esoteric but I simply ask these questions to get a feel of the overall skill level; it doesn’t matter if they answer these wrong. You can find a list of question here: How Well Do You Interview And Do You Use Wizard Driven Programming?

Here are some interesting answers from the interviews.
Almost every single person answered that an index scan is better than an index seek.
There were several people with SQL Server 2005 experience, these people couldn’t name one single new thing introduced in SQL Server 2005. I asked about windowing functions, DMVs, pivot, apply and more, this was all Greek to them. One person had on her resume that she developed an app in SQL Server 2005. When I asked about her experience she told me she just started to read about SQL Server 2005. This is a big show stopper, sometimes headhunters/recruiters will tell you to just add it to your resume, I wouldn’t do it because it makes you look bad. If the SQL Server 2005 experience is not true what else could be made up? One person had on his resume that he optimized complex stored procedures, when I asked how he did it, he replied that he only selected the rows he needed instead of the whole table. This obviously didn’t answer my question.

That is it for the phone interview, part 2 will be up in a day or two.

Monday, October 15, 2007

Interview With Kalen Delaney About Inside Microsoft SQL Server 2005 Query Tuning and Optimization

I am a big fan of Inside Microsoft SQL Server 2005 The Storage Engine so when I saw that yet another book got published in this series I just had to check it out. Inside Microsoft SQL Server 2005: Query Tuning and Optimization is very well written gets to the point and give you the answers that you need. I decided to contact Kalen to see if she would be willing to do an interview, by now you know that the answer is yes of course ;-)

The question-and-answer session with Kalen that follows was conducted via email.

What is the audience for this book, is it the enterprise user or can a small department benefit from the tips in this book?
Because this book deals with query tuning, anyone who writes SQL queries for SQL Server can benefit. Very little in the book is geared towards system tuning, so the size of the machine doesn’t really matter. Now of course, if you have a very small system with very small tables, you won’t get as much benefit out of tuning your queries. However, if you have any tables of more than a few thousand rows, and you do any joins, you will need to tune your queries. In addition, the issues of blocking and concurrency control can impact any system, no matter how small.

What new technologies in SQL Server 2005 do you think are the most beneficial for performance?
For very large databases, the best new technology is partitioning. For any size system, if you have had serious performance problems due to blocking, you might find a big performance benefit by using one of the snapshot-based isolation levels, but you really need to understand the resource costs that come along with the improved performance. For your individual queries, I think the new optimizer hints and query level recompiles can make a big difference. For indexes, the ability to add included columns to nonclustered indexes can give some of your hard-to-tune queries a major performance boost.

What will a person who reads this book gain in terms of understanding how to performance tune a server?
The focus of this book is not so much on tuning the server, but on tuning queries. There is more in Inside SQL Server 2005: The Storage Engine on server issues such as memory and processor management. The biggest server wide issues are covered in Chapter 5, when I talk about managing the plan cache, and how and when query plans are reused.

Is the book geared towards a beginner/intermediate level user or do you have to be an advanced user to really utilize the information in this book?
The book is not geared towards beginners, but everyone should be able to get something out of it. It’s primarily geared to SQL Server developers and DBA’s who have been working with SQL Server for a while, and have encountered performance problems that they are trying to find solutions for.

With all the changes in SQL Server 2005, how critical has the tempdb become in regards to performance?
Tempdb has always been important. In SQL Server 2005, if you are using one of the snapshot-based isolation levels, you are going to have to be more aware of the demands placed on tempdb, both in the sizing requirements and the additional I/O resources needed. Fortunately, SQL Server 2005 provides tools to monitor tempdb, including a dozen new performance monitor counters, and a dynamic management view, sys.dm_db_file_space_usage, that keeps track of how much space in tempdb is being used for each of the different kind of object stored in tempdb.

I understand that this is the first time you wrote with a team of other writers; can you tell us something about that experience?
I initially thought that not having to write the whole volume by myself meant that I could get it done sooner, but that was not the case. Everyone had their own schedule and their own way of writing. The personal aspect of working with the other authors was great. I deeply respect all of the others and it was an honor to be working so closely with them. I had some concerns about the depth of coverage and I wondered whether all the chapters would end up being as deep as I hoped for, but that turned out not to be a major problem. The only real issues were agreeing on a common terminology and coding style, and even that wasn’t that big of an issue, because I got to do a final editing pass on everyone’s chapters.

What SQL Server books are on your bookshelf?
All of the Inside SQL Server books are there, of course, and all of Ken Henderson’s books. Bob Beauchemin’s book is in my car, to read while I am waiting for the ferryboat, and while on the ferry. I also have technical books that aren’t SQL Server specific, such as Jim Gray’s Transaction Processing, Russinovich’s and Solomon’s Windows Internals, Chris Date’s Introduction to Database Systems and Mike Stonebraker’s Readings in Database Systems.

Why do you write technical books?
I love working with SQL Server and trying to find out all I can about it. When I found that I could explain difficult concepts in a training environment, I thought I could do the same thing in a written format, and reach more people that way. I have always loved explaining things, ever since I was a teaching assistant for High School Math.

Will you be updating your books for SQL Server 2008?
I have just started meeting with my editors at Microsoft Press about SQL Server 2008, and it looks like a revision is in the plans. We’re really looking at it as just a revision, with the same structures as the current books, with straightforward changes and the inclusion of new features.

Name three things that are new in SQL Server 2005 that you find are the most valuable?
Dynamic Management Views, Dynamic Management Views, and Dynamic Management Views!
Oh, you wanted three different things? ;-) How about XML query plans and optimization hints. (I’m also very fond of many of the new TSQL constructs, but I was only talking about things that I cover in my new book.)

Name three things which are coming in SQL Server 2008 that you are most excited about?
You’ll have to ask me this next time. I have actually been avoiding SQL Server 2008 while I was getting my Query Tuning and Optimization book finished, because I didn’t want to get distracted.

Can you list any third party tools that you find useful to have as a SQL Server developer/admin?
I’ve tried a few other products, but usefully I find that it is much easier to just stick with the Microsoft line and use the tools provided with the product.

Name some of your favorite non-technical books.
Oh, I love to read. It would be impossible to list my favorite books, but I can tell you my favorite authors, most of whom write science fiction: Lois McMaster Bujold, Ursula LeGuin, Sheri Tepper, Orson Scott Card, Octavia Butler, Elizabeth Moon. I also love to read historical fiction like Leon Uris.

Wednesday, May 16, 2007

Interview with Adam Machanic Author Of Expert SQL Server 2005 Development

I am a big fan of Adam Machanic and the way he thinks about SQL server development. How may SQL developers unit test their procedures or use assertions in their code? Well I can guarantee that there are very few of those. I was very excited when Adam Machanic contacted me to give me a copy of Expert SQL Server 2005 Development for review. I did not finish reading the whole book yet but what I have read so far is very good. If you are an advanced developer and want to kick it up a notch then this is the book for you.

The question-and-answer session with Adam that follows was conducted via email.

What is the audience for this book? Is the book geared towards a beginner/intermediate level user or do you have to be an advanced user to really utilize the information in this book?
The book is certainly not for beginners; I make a lot of assumptions about the level of knowledge possessed by the reader. The reader should understand the basics of writing T-SQL, creating indexes, reading query plans, etc. It's difficult to differentiate between intermediate and advanced users, as everyone has their own opinion of what those levels actually mean! I've covered several not-so-common topics, and my hope is that that almost any SQL Server practitioner who has advanced beyond the basics can learn something from the book.

What are the most important things a person can do to master Transact-SQL?
Practice, practice, and more practice! The more complex queries you write, and the greater the variety of problems you solve in T-SQL, the easier it gets. Just like other programming languages, T-SQL programmers use and reuse various patterns as they work on different projects and problems. Gaining skill in T-SQL is really nothing more than a process of learning when and how to properly apply the patterns to any given situation.

Adam, the first time I came across your name was in the Microsoft public SQL Server programming group. Can you explain to us why you participate in newsgroups and forums?
When I first started working with SQL Server and other Microsoft technologies I discovered Microsoft's news server and was shocked to discover the overall quality and high signal-to-noise ratio compared to many other (Usenet) newsgroups I'd previously frequented. Questions I asked were actually answered--by people who knew what they were talking about, no less! After spending quite some time asking my own questions I realized I could answer a few here and there too, and give something back to the community that had helped me so much. Over time I became hooked. What a great way to kill free time (especially while waiting for software to compile or long queries to run)--not only does it feel good to help others solve their problems, but it also helps to strengthen my skills as a developer. As I mentioned in the previous question, the most important key to software development greatness is practice. By answering newsgroup and forum posts, you can get exposure to all sorts of business problems and scenarios that you would otherwise never see. There is no better learning experience that I have found.

Why is it important to think in sets instead of procedurally?
It's often said that thinking in sets takes advantage of the declarative nature of SQL; and while this is certainly true I think it's important to drill down to why that's such a big deal. SQL query performance, generally speaking, comes down to I/O performance. Too many I/Os, or an I/O-bound system, and your query won't return fast enough--simple as that. When we write set-based queries, we allow the query engine to optimize in order to minimize the number of index requests--and, therefore, I/O operations--necessary to satisfy the query. Working procedurally, with loops, means that the opposite happens--the maximum number of index operations will happen on every invocation, and therefore the query will perform in constant time—at the worst possible level.

Why do you feel it is important to properly unit test procedures?
I cover unit testing in some detail in Chapter 2 of the book, including when and why you want to do it. But what it boils down to is that creating a solid unit test infrastructure helps to guarantee that as you change your code over time, you won't break anything--and if you do, you'll find out about it very quickly. Studies have shown that by far the most expensive part of the software development process is maintenance. This is due to the fact that as we change our software, we break other pieces (dependencies)--and oftentimes, those dependency breaks silently cause subtle issues, that aren't noticed until long after the change was actually made. This can make it extremely difficult to track down and fix the root causes of problems. Comprehensive unit testing makes it possible to eliminate a lot of that delay, by alerting the developer right away if something doesn't look good. This, in turn, results in better quality software, happier customers, more business, and as a result, bigger bonus checks. A win-win situation, if you ask me.

Which chapter was the hardest to write and can you explain why?
Chapter 10, "Working with Temporal Data," took me four months to write. It's my favorite chapter in the book, and one of my favorite topics to think about, but it covers a huge variety of scenarios. As I mention in the chapter, temporal data finds its way into every single database, so it's a large and far-reaching topic. Attempting to cover it in a fairly complete manner turned out to be quite a big task!

Why do you write technical books?
Mainly for the groupies. But aside from that obvious perk, I do it simply because I love sharing technology and techniques with others. Writing a book is similar to participating in a newsgroup or technical forum, but on a much deeper level. Whereas I might answer a question on a forum in 400 words, or write a 3,000-word article, in a book I can cover the same topic in 15,000 or more words--really guide the reader through all of the pitfalls and ins and outs of a given area, and do things that would be impossible in another media (except, perhaps, in a long series of blog posts or articles). Plus, let's be honest--it's just a real thrill seeing my name on the cover of a book!

Who are your favorite authors?
For technical titles, my favorites are Steve McConnell ("Code Complete," "Rapid Development"), Joe Celko ("SQL For Smarties" and others), and Andrew Troelsen ("Pro C# and the .NET Platform"). These authors manage to convey heavy technical topics in very readable formats--which is something I aspire to doing in my own writing.

Unfortunately, I haven't been reading much fiction recently. The last really great novel I read was "Cloud Atlas" by David Mitchell. I'm also a longtime fan of science fiction writers such as William Gibson ("Neuromancer") and Neil Stephenson ("Snow Crash," "Cryptonomicon").

I noticed you have a chapter that deals with application concurrency, do you feel that developers don't have a multi-user/multi-app mindset when first developing the data model/procedures?
With the recent trend towards multiprocessor and multicore machines, threading and synchronization have been hot topics. I've read lots of interviews and articles about how many developers don't understand multithreading well enough. Yet as database developers, I feel like we've been there for years--the database is almost always a shared resource, and managing synchronization is no less important there than in the application tier. A lot of the time we tend to think about concurrency merely in the context of isolation levels and the like, but there is a lot more to it than that--an application will often push much of its synchronization logic all the way into the data tier (whether on purpose or not), and how to properly handle things is really more of a business question than a technical one. When writing that chapter, I tried to frame the issues in terms of business requirements, and then show how to use SQL Server to satisfy them. I think that developers are sometimes prone to getting too excited about technology and trying to find a business problem to which they can apply some new technique--the whole "hammer looking for a nail" mentality--rather than the correct route of determining the business problem and then using technology to solve it.

What SQL Server books are on your bookshelf?
I've now gotten rid of most of my SQL Server 2000 books, except for Ken Henderson's Guru's Guide" series and Kalen Delaney's "Inside SQL Server 2000". My bookshelves (note the plural!) are filled with SQL Server 2005 books--for the first year or so after its release I managed to get almost every new book that was released. I eventually slowed down a bit, but it's hard--there is a lot of great material out there! I'm a really big fan of the "Rational Guides" -- short, concise, yet in-depth guides to just one topic, generally very well-written and easy to digest.

Why did you decide to include a chapter on spatial data?
Through nothing more than luck of the draw I ended up on a couple of projects involving heavy spatial queries, and found that it was an area I really enjoyed. I love to solve difficult problems, and since comparatively very few people are doing spatial in SQL Server, I found that there was a lot of room to innovate--so I ended up having a great time with these projects and totally got hooked. I feel that spatial data is an area we're going to see a lot more of in the future. Spatially-enabled devices are becoming more and more prevalent, and of course all of those devices are going to need data. Unfortunately, due to time constraints I was not able to write that chapter, so my friend Hugo Kornelis tackled it. Hugo did an absolutely fantastic job of covering the topic--I think it's a great chapter, and will help readers get right up to speed.

Are you still making your own beer?
I live in a fairly small apartment at the moment and am somewhat cramped for space as well as time, so I haven't brewed a batch beer for the last couple of years. However, I have been experimenting with ciders quite a bit, and have made a variety of batches using all sorts of adjuncts including honey, brown sugar, and malt extract. Cider is very forgiving and needs to sit around for a long time before it's drinkable, so it's easy to make in a small space with limited time. I've also made a few batches of ginger beer, using a recipe that we actually designed in an online SQL Server forum!
(http://www.simple-talk.com/community/forums/thread/1232.aspx )

Table Of Contents
CHAPTER 1 Software Development Methodologies for the Database World
CHAPTER 2 Testing Database Routines
CHAPTER 3 Errors and Exceptions
CHAPTER 4 Privilege and Authorization
CHAPTER 5 Encryption
CHAPTER 6 SQLCLR: Architecture and Design Considerations
CHAPTER 7 Dynamic T-SQL
CHAPTER 8 Designing Systems for Application Concurrency
CHAPTER 9 Working with Spatial Data
CHAPTER 10 Working with Temporal Data
CHAPTER 11 Trees, Hierarchies, and Graphs
INDEX

Sample: Chapter: Ch. 01 - Software Development Methodologies for the Database World
Amazon Link: Expert SQL Server 2005 Development

Sunday, April 15, 2007

Interview With Itzik Ben-Gan Author Of Inside Microsoft SQL Server 2005: T-SQL Querying

If you are a SQL developer then you know who Itzik Ben-Gan is. If for some strange reason you don’t know then listen up. Itzik is a SQL Server MVP, he writes a monthly column for SQL Server Magazine and he is also a principal mentor and founder of Solid Quality Learning. You can also find him in the SQL Server programming public newsgroups. If you browse the MSDN site you will find several articles written by Itzik.

I purchased Inside Microsoft SQL Server 2005:T-SQL Querying and must say it is my favorite SQL Server 2005 book so far. Chapters 3, 6 and 9 alone are worth the money for the book. I have posted a link to a sample chapter (6) at the end of this post if you are interested to learn more about the style of writing and just to see how awesome this book really is.

Let’s get started with the interview

Q. What is the target audience for this book?
A. T-SQL developers with intermediate to advanced background.
By the way, the book is actually split to two volumes: Inside Microsoft SQL Server 2005: T-SQL Querying and Inside Microsoft SQL Server 2005: T-SQL Programming. Originally I intended to write one book, but I ended up with over 1200 pages; so I had to split it to two. But you should consider them as volumes 1 and 2 of one book. T-SQL Querying should be read before T-SQL Programming.

Also, I’d like to mention that several people contributed to the books, and I had great pleasure working with them. Lubor Kollar wrote Chapter 2 - Physical Query Processing of the T-SQL Querying volume and also provided great help and advice. Dejan Sarka wrote the CLR and XML content and explained what they have to do with the relational model. Roger Wolter wrote Chapter 11 - Service Broker of the T-SQL Programming volume. Steve Kass was the technical editor of the books, but contributed way more than what you would normally expect from technical editors. You can read about the contributors here:

http://www.insidetsql.com/

Q. What new technologies in SQL Server 2005 do you think are the most beneficial for developers?

A. I don’t think that it would be right of me to name two technologies in general, rather, depending on need. For example, for applications that involve XML manipulation, developers will find XML enhancements beneficial (and there are major ones in the product). For applications that need to implement at the database computationally intensive calculations, complex calculations, iterative logic, string manipulations, parsing, and so on, developers will find CLR integration beneficial. For applications that need queuing support, developers will find the new queuing infrastructure and the service broker platform beneficial. And then there’s row versioning embedded in the engine supporting the new snapshot isolations and other aspects of SQL Server; programmers will find this technology beneficial for certain types of systems that suffer from concurrency problems.

As for things that are closer to my heart; in regards to T-SQL, the two new features that I find most beneficial for developers are the OVER clause (e.g., with ranking calculations) and Common Table Expressions (CTE).

The OVER clause is really profound; I feel that it helps bridging the gap (or maybe I should say abyss) between cursors and sets. The OVER clause wasn’t implemented fully in SQL Server 2005, but even the existing implementation (especially with ranking calculations) allows simplifying and optimizing many pre-2005 solutions.

As for CTEs, they have both nonrecursive and recursive forms. The nonrecursive form has several important advantages over derived tables, allowing less code and better code maintenance. The recursive form allows for the first time in SQL Server’s history to write recursive queries, and very elegant ones. These are especially useful in manipulation of graphs (e.g., employee organizational chart, bill of materials, etc.).

Q. What will a person who reads this book gain in terms of understanding how the query engine works?

A. The approach I took in the books was first to understand logical query processing, then physical query processing (the engine), and then tackle practical problems where you need to combine your knowledge in both. It’s a problem based approach; I didn’t attempt to explain the engine for the sake of understanding how the engine works; rather to look at common practical problems developers face, write several solutions to each problem, analyze how the engine processes the solutions, optimize their performance, and simplify them. Ideal solutions are those that are both simple and perform well. A complex solution may have the side-effect of making the author proud of the ability to write such a solution, but is more prone to bugs and obviously involves maintenance overhead. The real beauty is in simple solutions that also perform well.

Regarding querying logic, the books put a lot of emphasis on correct understanding of SQL and thinking in terms of sets; this is a very though phase that developers have to go through since for most of us mere mortals it’s not a natural way of thinking.

Q. What are the most important things a person can do to master Transact-SQL?A. The quickest way is to issue the following code:

USE master ;-)

On a more serious note, I believe that there’s no quick way—there’s a way; it’s an ongoing thing. I can give recommendations based on what I try to follow. But some aspects of the way are personal; different things may work differently for different people. Anyway, my two cents worth…

Most importantly, you need to master the basics; or maybe I should use the term fundamentals. Ego and vanity can be big obstacles here. People try to jump directly to what they consider “advanced” and don’t get the fundamentals right. Advanced techniques are a matter of combining fundamental techniques. So be diligent and invest a lot of effort in understanding the fundamentals well. In terms of querying logic this means logical query processing, sets, NULLs, three-valued-logic, all-at-once operations, and so on. In terms of physical processing, this means getting to know internal structures, access methods, analyzing execution plans, and so on.

When facing new problems, try to solve them yourself before looking at others’ solutions. When you try something yourself you understand it best. This will also help you become more creative and develop your own techniques. Also, try different approaches, not just one, and compare the solutions.

Never consider yourself as being finished. Strive for perfection but never consider your solution perfect. Constantly work on problem solving; look for new problems and keep going back to old problems and try to improve the solutions. This way you polish and perfect your techniques and enhance your vocabulary.

I also feel that you get a much better understanding of a subject when you need to explain it to others.

I truly believe in all of the above, but ideally, you don’t need me to tell you these things. In fact I feel awkward and uncomfortable giving such recommendations; so if at this point you think that I’m a fart I won’t hold it against you. I feel more comfortable writing queries and talking about them. ;-)

As I said earlier, this simply feels natural to me. Though some things are common in the way to master any profession, some things are and should be personal; namely, you have to find your own way and not necessarily follow what someone else recommends to the letter. :-)

Q. Itzik , the first time I came across your name was in the Microsoft public SQL Server programming group. Can you explain to us why you participate in newsgroups and forums?
A. In the past I used to be much more active in the newsgroups; today less, but whenever I participate I feel revived—especially when visiting the public programming newsgroup. If I had to name one thing that had the greatest impact on my knowledge in T-SQL it would doubtless be my participation in the newsgroups. It’s a very fertile soil to grow. So many practical problems are posted there daily, and by trying to solve other people’s problems, you end up learning more and more. Also, many practical problems are repeated there frequently, and as time goes by, you keep perfecting your techniques, and also sometimes get to see very unique and interesting solutions posted by other people. People pay so much money to study in schools, colleges and universities; here’s a great place to learn for free, and while you’re at it, you end up helping other people. :-) It’s the perfect Dojo for T-SQL practitioners.

Q. How did the massive changes between SQL Server 2000 and SQL Server 2005 affect the research for your book?
A. I think I went through a different experience than some other authors. I’m so immersed in the T-SQL language that I feel that it’s my mother tongue. Seriously, in many cases I feel that I can express myself better with T-SQL than with English or even Hebrew which is supposed to be my native tongue. I was deeply involved in writing T-SQL code using new language elements from the very early builds of Yukon. Also, our company, Solid Quality Learning, works closely with the SQL Server development team and customer advisory team (CAT). We worked on whitepapers for beta builds and got access to specs in very early stages of the product. I also developed and delivered courses on Advanced T-SQL before writing the books. So writing the books did not involve research, rather the material was essentially part of me, and all I had to do was to convert my thoughts to words.

I have similar experience with writing in general. I have a monthly T-SQL column in SQL Server Magazine, and often people ask me where do I get ideas, and what’s the process involved. I work on so many ideas constantly not for the sake of writing, rather, it’s a natural part of me—the way I was talking about earlier. So when it’s time to write a column, it’s basically pulling something from one of the drawers in my mind and putting it into words.

Q. Name three things that you wish would be in Katmai (the next version of SQL Server)?
A. These are easy to name, but I doubt that my wishes will come true in Katmai:

1. Support the ORDER BY sub-clause of the OVER clause for aggregate functions; BTW, this has nothing to do with ordered aggregates, rather it’s a logical way to define to a window function which window to work with. I’ll provide a link to a paper which I wrote recently for people looking for more details.

2. Support the ROWS and RANGE windowing sub-clauses of the OVER clause for aggregate functions.

3. Add support for row value constructors/vector expressions.
In general I wish to see more focus on T-SQL enhancements, and specifically, a more complete implementation of the profound OVER clause. It bears great benefits within it that I believe many people are not aware of yet. You can find more details in the following paper:

http://www.insidetsql.com/OVER_Clause_and_Ordered_Calculations.doc

Q. Do you think that every developer should have a numbers table in their database?
A. That’s a good question that makes me smile. :-) Yes! If not a table, a table function. I’ve used an auxiliary table of numbers in so many solutions; I find it a great helper tool to solve many querying problems. Just as an example, here’s a query using an auxiliary table of numbers (called Nums with a column called n) that splits strings containing comma separated lists of values (stored in a table called Arrays, with columns arrid and string):

SELECT
  arrid,
  n - LEN(REPLACE(LEFT(string, n), ',', '')) + 1 AS pos,
  SUBSTRING(string, n,
    CHARINDEX(',', string + ',', n) - n) AS val
FROM Arrays JOIN Nums
  ON n <= LEN(string)   AND SUBSTRING(',' + string, n, 1) = ','  
ORDER BY arrid, pos;

And if you need to operate on a single array (e.g., one that you get as an input to a stored procedure), you can encapsulate the logic in an inline table function:

CREATE FUNCTION dbo.fn_split(@arr AS VARCHAR(MAX)) RETURNS TABLE
AS
RETURN
  SELECT
    n - LEN(REPLACE(LEFT(@arr, n), ',', '')) + 1 AS pos,
    SUBSTRING(@arr, n,
      CHARINDEX(',', @arr + ',', n) - n) AS element
  FROM dbo.Nums
  WHERE n <= LEN(@arr) AND SUBSTRING(',' + @arr, n, 1) = ',';  
GO    

SELECT * FROM dbo.fn_split('10248,10249,10250');

Output:
pos element
---- --------
1 10248
2 10249
3 10250

This implementation has major advantages (both in terms of security and performance) over the commonly practiced solution utilizing dynamic SQL.

Erland Sommarskog, a fellow MVP, wrote a great paper on the subject. You can find details here:
http://www.sommarskog.se/arrays-in-sql.html

By the way, you can vote for adding the table of numbers as part of the product here:

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=258733

As a tip, here’s a very fast inline table function that will return a table of numbers for you:

CREATE FUNCTION dbo.fn_nums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
  WITH
  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
  L1   AS(SELECT 1 AS c FROM L0 AS A, L0 AS B),
  L2   AS(SELECT 1 AS c FROM L1 AS A, L1 AS B),
  L3   AS(SELECT 1 AS c FROM L2 AS A, L2 AS B),
  L4   AS(SELECT 1 AS c FROM L3 AS A, L3 AS B),
  L5   AS(SELECT 1 AS c FROM L4 AS A, L4 AS B),
  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L5)
  SELECT n FROM Nums 
  WHERE n <= @n;  
GO     

-- Test function 
SELECT * FROM dbo.fn_nums(10) AS F;  
GO

Output:

n
---
1
2
3
4
5
6
7
8
9
10

It’s even faster than querying a real table of numbers. ;-)

Q. Why is it important to think in sets instead of procedural code?A. SQL is based on the relational model, which in turn is based on set-theory and predicate logic. The relational model is a proven model for handling data—for several decades. It’s important to adopt set-based thinking in order to write simple and efficient solutions, otherwise, you defeat the whole premise on which the model is based. SQL queries (set-based) have several advantages over procedural/iterative code (e.g., using cursors) as far as data manipulation is concerned. Set-based solutions focus on the “what” part of the solution; namely their logic. Iterative solutions spend a lot of code on the “how” part, clouding the logic behind all the code; lengthy code naturally has maintenance overhead. Set-based solutions allow the optimizer to choose an execution plan among several possible plans based on cost estimations, while iterative solutions typically force a very rigid plan. In short, set-based solutions allow you to focus on logic, and speak with the database in its native language. Also, cursors have a lot of overhead associated with the record-by-record manipulation.

I’m not saying that ALL problems are best solved with set-based solutions, but the majority of data manipulation related problems are. Tasks that involve less data manipulation, and more computations, string manipulation, parsing, inherently iterative logic, these can benefit from procedural solutions. But T-SQL (and SQL in general) was not designed to handle such tasks efficiently. That’s an area where .NET is much stronger, richer and more efficient. These are the areas where I see the benefit in the CLR integration in SQL Server 2005.

Q. Which chapter was the hardest to write and can you explain why?
A. Chapter 3 – Query Tuning in the T-SQL Querying volume. It was the hardest for several reasons.

In this chapter I cover internal structures and access methods. It was very important for me to depict internal structures and diagrams of access methods graphically as accurately and as detailed as possible. I created many figures and worked on each for many hours at the pixel level. Several figures took a day of work each. I actually took snapshots of real internal structures like pages of B-Trees with DBCC PAGE and DBCC IND commands, and incorporated real bits of data in the diagrams to make them as reliable as possible.

Also, the chapter appears early in the first volume. I wanted it early in the books to provide important foundations and background that is required for the rest of the books. But I had to include queries used for performance analysis that use advanced techniques that I explain later in the book.

There were also other obstacles.

This was the chapter I wrote last, and it took me several very draining weeks to complete. It ended up with over 100 pages. There’s a lot of query tuning coverage throughout the books, but it’s not the tuning part that was hard, rather the aforementioned reasons.

Q. Why do you write technical books?
A. I’m a teacher in heart and sole. I think that writing is an integral part of teaching. It’s a flowing river; you acquire knowledge and you pass on your knowledge to others. It’s very hard and requires a lot of discipline, but ultimately it’s a process from which you also gain a lot.
Also, there’s a more selfish reason. As I said, I’m a teacher and I want students in my classes. Books give you exposure and I hoped the books would get more students in my courses.

Q. Who are your favorite authors?
A. JRR Tolkien, William Goldman, Douglas Adams, Edgar Allan Poe, Niccolo Machiavelli, Yamamoto Tsunetomo, Miyamoto Musashi, Sun Tzu, J.K. Rowling

By the way, some of my favorite authors did not write any books, yet. :)

Here are a few examples:

David Campbell, who is the General Manager of Strategy, Infrastructure and Architecture of Microsoft SQL Server used to participate in the past in some private SQL newsgroups back in the days when he owned SQL Server’s Storage Engine. His depth of knowledge, teaching and explanatory skills, passion, humility, great English, all were evident in his posts. I asked him to write the foreword to the T-SQL Querying book; you have to read it to understand what I’m talking about. It’s pure joy to read his writings, and I sorely miss his participation in the newsgroups. I wish he had written books back in the days when he was very deeply involved in the technicalities and small details of SQL Server’s engine.

Roy Harvey, a fellow SQL Server MVP. He’s active in the SQL Server community especially in the SQL Server Programming newsgroup. When you read his posts you feel the experience, wisdom and humility flowing from them. I would have very much loved to see work published by him, but you know, some very wise and knowledgeable people shy away from publicity and exposure—unfortunately for us.

Steve Kass is another good example. Steve was the technical editor of my books and there are no words that I can use to describe his capabilities. He has a very unique combination of knowledge, logic, mathematics, SQL, genius, perfect English, and a way of explaining things in a simple manner. Oh, how much I’d love to see his work published in books some day!

I also like reading posts by David Portas; another fellow MVP. I’d love to see a book on modeling published by him some day.

Q. What SQL Server books are on your bookshelf?
A. With your permission, I’d like to broaden my reply to books covering topics that I believe are important for SQL practitioners. On my bookshelf you will find The Art of Computer Programming by Donald E. Knuth, books about logic puzzles, The Thirteen Books of The Elements by Euclid, Elements of Set Theory, Graph Theory, Regular Expressions Pocket Reference by Tony Stubblebine, Mastering Regular Expressions by Jeffrey E. F. Friedl, several SQL books by Joe Celko, SQL-99 Complete, Really by Gultzan and Peltzer, Inside Microsoft SQL Server (various versions; early by Ron Soukup, later by Kalen Delaney), The Art of SQL by Stephane Faroult with Peter Robson, and others…

You can find a more extensive list here:

http://www.sql.co.il/books/insidetsql2005/resources.htm#Books

I have to say though that I probably spend more time writing stuff than reading…

Q. I know you like math and logic puzzles, do you think it is beneficial for a SQL developer to be somewhat good in math?

A. Yes, but I would say it is beneficial to be good in logic and not necessarily generally in math. That’s why I included a whole appendix with pure logic puzzles in the T-SQL Querying volume. Some people find it odd, while people that truly follow the SQL way find it natural (at least that’s what I believe). Almost every SQL querying problem in its hart is a logic puzzle. It’s very common to see that people who are deeply immersed in SQL are also very fond of logic puzzles. It’s not by chance. My belief is that if you practice logic puzzles you can improve your SQL. Also, logic puzzles are fun and can be practiced by the whole family.

And there’s another reason to dwell on logic—one coined by Sir Arthur Conan Doyle:

"Crime is common. Logic is rare. Therefore it is upon the logic rather than upon the crime that you should dwell."

Amazon Link: Inside Microsoft SQL Server 2005: T-SQL Querying
Sample Chapter: http://www.sql.co.il/books/insidetsql2005/sample_chapters.htm#T-SQL_Querying_Sample_Chapters

Wednesday, November 29, 2006

Interview With SQL Server MVP Louis Davidson: Author Of Pro SQL Server 2005 Database Design and Optimization

After interviewing Ken Henderson I decided to try my luck with Louis Davidson. By now you already know that he said yes ;-). Louis is the author of Pro SQL Server 2005 Database Design and Optimization and a SQL Server MVP. I have a copy of his book and I highly recommend it to anyone who wants to learn about designing and optimizing databases. The question-and-answer session with Louis that follows was conducted via email.

Louis, the first time I came across your name was in the Microsoft public SQL Server programming group. Can you explain to us why you participate in newsgroups and forums?

I have been involved with newsgroups since back in college when I was an avid reader of rec.music.beatles. And when I found the SQL Server groups (back then, before Microsoft’s groups it was comp.databases.ms-sqlserver), I started reading them, and finally giving some feedback. I then spent time in the newsgroups trying to answer people’s questions to help them, and help me extend my knowledge past school and technical books; because the real situations that people get into are more interesting than anything you can get in a textbook.

I quit hanging around for quite a while because the public newsgroups (just got too doggone noisy. I mean come on, there is only so often that one can have certain, “personal” products sold to them.

But about three years ago, I went to a session with Kalen Delaney (who was then, and is now one of the people who I look up to greatly,) who was talking about things to do to promote your books. I started then trying to answer three questions a day (or more) on the Microsoft newsgroups (instead of the public ones) and now on the MSDN Forums (http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=85&SiteID=1).

I don’t know that it helped my book sales in any overwhelmingly large way, but it certainly has helped me in my career. I find that so many things I have learned while helping others end up being useful at my day job. Plus, my philosophy in regards to SQL Server is simple: the more people who like SQL Server the more likely that the product will be around when I retire in 30-40 years.

I have noticed that you have explained the 12 rules of Codd, do you think that it is important to know these rules as a database developer?

Absolutely, if you don’t know why you do something, then it is just too easy to think that it isn’t worth it. And you can’t really bend the rules until you know them. Codd’s rules are the foundation that the RDBMS was built upon, which the begat normalization. Once people get that SQL Server is essentially a tool largely created to manipulate and serve up normalized data (a concept that was built upon the original 12 rules) it actually becomes essential to normalize, not just something that some square geezer is telling you that to do.

Why did you decide to write technical books?

Two reasons:
1. So I would have a reference to refer to when I need to recall something.
2. Because I hate having free time.

Seriously, I had really done quite badly in my senior level database design class in college, but good enough (I think I made a C.) So when I was thrust into the database programmer role, I wanted to learn how to do it right (something my father had not so subtlety driven into my brain years before.) What made things even better was that I had several mentors who believed in doing things right during my first six or seven years of programming, so I learned essentially only good practices, but I still wanted more. I couldn’t find a book on database design that was not written as a text book, so I suggested a book on database design to the editors at WROX, whom I was working for as a tech editor on a Visual Basic/SQL Server book. I expected to tech edit this design book, or possibly write a chapter or two, but before I knew it they had contracted me to write the entire book (Chuck Hawkins wrote a chapter for me, but I wrote the rest) and paid me what seemed like a ton of money (of course, if I average it out to an hourly wage, I averaged far less than minimum wage when all was said and done.)

The process of writing the book was an amazing journey, one because I had never written a paper greater than like 15 pages (my degree is in CS, so all of our papers were really quite small in comparison) and I had to learn a good deal of stuff along the way because I was not really an “expert” at the time (and the more I know, the less I feel like an expert now!) The thing about writing is that you have to be more correct than in almost any medium. When you speak on a subject, if you say something, the people in the audience go, “whu?” and if you talk fast enough and right enough, they will likely forgive you (as long as you don’t do it again.) But if you mess up in writing in a book, it just hangs there over your head forever.

When it was over I had something I was proud of and I said what every technical book writer I have talked with says: “NEVER AGAIN! Too much work, not enough gain, I could make more money cleaning out stalls at the local pet store!!!” I have now said more or less the same thing three times now, and I hope to say it again around the time Katmai ships.

Which chapter was the hardest to write and can you explain why?

Well, the ones that were the hardest were actually the easiest, since I got the Kevin Kline and Kurt Windisch to write them. These guys were excellent co-authors, and did great work. Kurt’s section on CLR programming is great, with some awesome example code that I often use when I need to build a CLR function. Kevin wrote a chapter about cross-platform development that rounded out the book.

Of the chapters I wrote, the most troublesome was probably the security chapter. Security is a very complex topic and there are many, many ways to enforce security. Even worse, there are too many ways to get around it once you have it set up. It is the one chapter that I am not completely happy with, and the one that I plan to do some writing on again in the next few months, most likely after the holidays, during which I hope to do very very little that isn’t concerned with eating, sleeping, or going to Disney World.

What is the audience for this book?

Well, with this being my third book I was kind of hoping that my audience had grown to Dogbert
-like proportions, but I have to be reasonable. I would simply be satisfied if I could get every SQL Server programmer to read it (and buy it!) The overall goal was to make a practical guide going through the boring bits (theory, modeling), the stuff you hear about but rarely really understand (normalization) and to include the more practical topics like constraints, triggers, security, etc that even people who have years of experience would like.

I think that most people will get something from the book, from the newbie to the pro. Of course the payoff will be far less for someone who already knows the theory, but there is something in there for everyone that will at least be worth your 60 bucks (which is the full retail price. Don’t pay full retail price!)

I also try to keep it as light as possible, considering the very dry topic. There are several little humorous bits thrown in, and for fun I added a hidden message for a contest to run later (actually going on right now in the end of 2006.) All of the clues are leading to a message that I “encrypted” into a bit of code in the book (and yes, that is bonus clue for reading through this really long interview!) for a bit of fun.

Ultimately, I want the book to be something that I would purchase in each version at the very least for the practical aspects of the book. You might think that since I wrote the book I wouldn’t refer to it, but I often use the templates for triggers, procedures, error handling, etc from the book whenever I need to write one without using some tool.

Which SQL Server books are on your bookshelf?

Well, to tell the truth, I have quite a few on my desk and no time to read them. I have thumbed through Itzik’s book, “Inside Microsoft SQL Server 2005: T-SQL Programming”, and it is pretty darn good. Dejan Sarka, who was one of my technical editors worked on that book too. Beauchemin and Sullivan’s “A Developer’s Guide to SQL Server 2005” seems pretty good, and I have used Tachev’s book, “Applied Microsoft Analysis Services 2005” several times to try to figure out how different Analysis Services was in the 2005 version.

And I would be remiss to not mention Kalen’s previous “Inside SQL Server 2005” book. What a great reference that book was. I can’t wait to get my hands on her new books (not that I would have the time to read them either!)

What are/could be some consequences if your data model is not in Third Normal Form or even First Normal Form?

The earth could stop spinning on its axis, at least that is what I try to tell people. The more reasonable answer is that you will have nagging, annoying issues with your data that will require more and more management just to keep straight.

Normalization pushes data consistency issues to the foreground and forces you to deal with them. For example, take first normal form. One of the rules of first normal form is that every row is unique. A constant question on newsgroups/forums is: “I have duplicate rows, how do I get rid of them?” Well, go through this following messy set of steps, and then PROTECT YOURSELF USING KEYS!
Another concern of first normal form is scalar attributes. I like to define this as keeping your data broken down to the lowest form you will manipulate using SQL. So if you have to use substring for any common operation, they you likely have an issue. For example take an address, if you use addresses just as the user types them in, then just have a text column, put the address in it. But if you have a mailing system and you have to cleanse the data, and then break the address down into all of its constituent parts instead of going through the process every time you need an address. SQL is great for taking bits and pieces and constructing what you need, but lousy at breaking things down.

For third normal form, the problems are more ominous. If the data in the table isn’t really describing the thing that the table is modeling, you will end up with redundant data. Redundant data doesn’t sound so bad, until you have need to update the data. Consider the case where you have a table modeling a person, and in this table you have the person’s manager, the manager’s name and address. From this sentence, it almost sounds plausible and correct. It certainly is something you might want to see on a screen. But is the manager just the manager for the one person? If not you might have ten copies of the managers information. Now you have to make sure and deal with all ten copies, which is a real pain, to say the least.

How did the massive changes between SQL Server 2000 and SQL Server 2005 affect the research for your book?

The beauty for me was that the truly massive changes made little difference. Since the book I have written is about database design, things haven’t changed so incredibly much from that angle in any version of SQL Server. SQL evolves and gets better and there are optimization changes, but the basics remain the same. Fortunately as machines get more powerful we need to tweak performance less and less and we can just focus on data integrity and usability through normalization.

The biggest differences that I am interested in from an authoring standpoint have been in the ability to do more with T-SQL (I wrote the T-SQL for Developer’s chapter in the Pro SQL Server 2005 book for Apress), and now using the CLR functions. I had Kurt Windisch (a friend who was a board member for PASS at the time) write a very large CLR section, and I am so glad. Now I have a nice reference and excellent code samples for CLR objects. It is one of the great things about a book like this in that it is not overview book where you simply give the new features and come up with some reason you might want to use the features. If the feature isn’t tremendously useful, we can say that it isn’t useful. I think we got it pretty much right by saying that functions are really the most useful part of the new CLR objects, as well as to a lesser extent aggregations.

I admit that I don’t cover XML, something that I lament occasionally. I had waffled on including it for months because I just don’t feel that it is that important to OLTP systems, but on the other hand it could certainly be useful to implement a set of sparse properties (a topic I do touch on) and had arranged for another coauthor set up to write me a chapter about XML, but they dropped out too late to get another one. Even today I am considering commissioning an “add-on” chapter about XML to send out to readers (if you want it, vote with your email to drsql@hotmail.com and your wallets by purchasing the book so I can afford to pay someone. )

Name three things that are new in SQL Server 2005 that you find are the most valuable?

Dynamic Management Objects. Wow, I can’t even begin to start on just how great these are. The best one to me is outlined in this blog entry: http://drsql.spaces.live.com/blog/cns!80677FB08B3162E4!1443.entry . Automatic index tuning is awesome. I have used the basis of this system to optimize a third party system in excellent ways using the number two feature.
Included columns on indexes. These have really made it easier for dba’s to optimize queries by building covering indexes for queries without bloating indexes to death. What’s more, because you aren’t limited to 900 bytes, you can cover some queries in larger, more unwieldy systems that you just couldn’t do before. They aren’t the end all as they still have to be maintained, but they are certainly excellent in many situations. We have used them extensively to optimize a third party system that was very heavy on reads to cut down tremendously on full table scans.
The new BI Paradigm. The attribute oriented nature of Analysis Services is just plain cool. It gives users this extremely rich set of possibilities for viewing data using their cubes that is just so much better than they had in 2000. It was a tremendous change in gears though, and I am very glad that the fundamentals of dimensional modeling didn’t change, so the Kimball paradigm data warehouse we had was still quite viable using either platform, just easier to use.

Name three things that you wish would be in Katmai (the next version of SQL Server)?

I should have finished this interview a few weeks earlier, and I could have spoken a bit more freely about this subject, but at this point I am under NDA for what I have heard about Katmai. What I will say is that I hope to see as much work done in the relational arena (T-SQL and the engine) as they do for the other areas like XML, or tools. The heart and soul of SQL Server should always be right there, lurking in the name: SQL. All of the other new stuff they have added over the years has made things better for us, but I would have traded many, many non relational features to have had the ROW_NUMBER() and RANK() functions in an earlier version.

Of course, this is just my very selfish view on things as a Data Architect/Relational Programmer. In this particular case, even more than normal, I particularly speak only for me.

How critical is a proper data model for a good performing database?

Freaking essential! This is like asking how important is it to know the race route to win the race. The SQL Server engine is amazingly good at serving up data, but it is up to the implementer to know what data is needed and to put the data into a format that works the way that SQL Server works.

I would also be remiss if I didn’t take this one step further. How can you build anything without design? Not one programmer would call up a contractor and say “build me a house, three bedrooms, two baths” and then just let the contractor sketch out a rough drawing and then start working would they? And while I realize that it seems that software is a lot more pliable than building a house, once you have built the foundation classes, the database, and started a UI, rework can be just as costly as running a bulldozer though a regular house, and on a particularly a large software project, far more expensive.

What are some of the biggest mistakes that people make when designing a database?

I did something this for Greg Low’s really cool SQL Down Under podcast (where you get a good idea that I am terrified of speaking publicly) but lets just boil it down to two here:

Treating database design like application design - The reason why the concept of a distinct business layer is so intriguing to me is that we can transform the user’s views/needs from the GUI through this layer to the data layer. Each layer of the application has it’s own needs (UI, Object Layer Representation, Data Access Layer, Stored Procedure Layer, Database Structures) each very important in it’s own right. More importantly, each has very different design goals.
Forgetting about the future – During the design phase of any project, changes are very, very cheap. So if you realize you want to have a table with a domain of values instead of a simple varchar column, it is as simple as adding a rectangle to your model and connecting rectangles. Same goes with the discovery that the cardinality between two of your tables was not what you originally thought (“oh, a person is allowed to have more than one phone number in this system.”) However, as time passes, and more code is written, and more data is added to the system, the costs go up almost exponentially. Of course when the project manager has saddled you up and is giving you the crop, demanding results (that is their job after all) it can be all too easy to simply ignore the future and start coding. Always think about Future-You and how annoyed he is going to be when he realizes that had you just spent another hour three months ago, two weeks of work (of course corresponding to Future-You’s vacation to Disney World) could have been avoided.

There are so many others, bad naming, poor constraint usage, under-using stored procedures, ignoring security, weak testing, and so on and so on. But I think I can safely say that if you start with these two concepts, a great number of these bad habits will correct themselves.

Can you list any third party tools that you find useful to have as a SQL Server developer/admin?

I have become quite enamored with the Red-Gate tools, especially the database compare tool. I use it quite often to compare databases to find changes that have been made, especially for indexes and stuff done for tuning a database. The Apex tools are pretty good too.

The one tool I have used a lot lately has been the Automatic Indexing stuff from the query optimization team: (http://blogs.msdn.com/queryoptteam/archive/2006/06/01/613516.aspx). I am working right now on optimizing a homegrown/third party “object oriented”-esque tool that has a great deal of object abstraction built in, and have found tremendous performance gains from some really esoteric indexes this thing added to some of the queries that I would have never thought of in a month of Sundays. I wrote about it in more detail here: (http://drsql.spaces.live.com/blog/cns!80677FB08B3162E4!1443.entry and yes, I know this is the second time I have mentioned it, it is that important!)

In general, I am not so much a tool guy as I am a script guy. I use the UI to script out objects quite often, but then I try to edit everything with T-SQL so I know how it is done. The people who get too comfortable in the wizards or GUI often forget or don’t learn all of the tricks that you can do with the raw SQL commands, because it would just be too hard to build the perfect UI. So they give you 50% in the wizards and 90% in the GUI, and the rest is “advanced” stuff. If you don’t know about the nuts and bolts, you might be tempted to say “SQL Server doesn’t…”.

Lastly I have always been a big fan of ERwin for modeling. It does a really good job of representing my structures, and I like their macro language for building very customized scripts to build tables/views/procedures/etc.

What is the biggest Database in GB/PB that you have ever worked with, how many rows was your biggest table ever?

It was about ten feet across, six feet high. I had just started my new job as a LAN administrator and occasionally I had to fill in for the night operator. We had these 14 inch across reels of something that looked like a reel to reel tape, but they….wait, do you mean gigabytes and petabytes? I was thinking Great Big, or Pretty Big.

Well definitely no petabytes, and only recently have we started thinking in terms of terabytes, some of which is due to the fact that we have started storing everything as Unicode, essentially doubling the size of the database. We probably have around a half of a terabyte in our OLTP system and Data warehouse combined. I am never too excited about the size of a database as I am of the number of rows in a key table. You could have 10 billion rows and a very small database because it was designed nicely, or to the contrary, a million rows and have a terabyte if you just stored all of your data in text blobs (which is a favorite kind of storage for too many people.)

The largest database I designed I never got to see come to fruition because our company went through a couple of downsizing “events”, which I was part of the next to last one. I only know it was large because a prior coworker of mine called me up one day and told me we had gone past the 2 billion row mark and they were to changing to use bigint rather than basic int types for the surrogate keys. Tons of rows, but really they were quite small rows. We were keeping information on Internet equipment that the company had deployed. It was a pretty cool system that I wish I could see in action today (if it is still being used. The concept was neat, but as the implementers we were never 100% sure if the data would be useful later in the process.)

When did you start blogging and why?

When is easy, almost two years ago when I started writing my last book. The answer to why is pretty much the same as the answer to the technical books question, but increased an order of magnitude (as a reminder, that answer was: Two reasons: so I would have a reference to look into when I need it; and because I hate having free time.)

Blogging is possibly the biggest time sink on earth really. When I am working a “normal” 40 or so hour week in my day (read: paying) job, it is easy to spend an hour or so a day blogging, but when my job squeezes my time, it gets really hard.

But the most exciting part of blogging is the incredible active reference it gives me, in a format that really works for me. I can also use more of a shotgun approach and just write about what I want to write about on any topic, SQL or otherwise, rather than just writing about database design exclusively.

Over the years, I have tried creating libraries of code on my laptop, but that never has seemed to work for me. As I have aged (a little) I have started to realize that commenting, blogging, writing, etc is not for you, it is for Future Me and other people. I have so much code on my laptop from the past 10 years that I don’t know what the heck to do with it. I have VB6, SQL from past ages, failed article attempts, diagrams, etc all over my projects directory. So Past Me (that jerk!) is punishing me with his laziness, something I now vow not to do to Future Me. Confused? Good.

I started to realize that my blog had gotten out of control with so much information that I could no longer search and find when I found OfficeLive.com, a Microsoft site that allows you to create a free website for your small business. So I created drsql.org as a reference for my books, presentations, and my blog. On there you will find a decent enough index to my blog entries, usually within a month of being up to date, as well as a list of about ten articles that I can get to from the front page just about instantly.

About the book:
By Louis Davidson, Kevin Kline, Kurt Windisch
ISBN: 1-59059-529-7
672 pp.
Published: May 2006

Table Of Contents
CHAPTER 1 Introduction to Database Concepts
CHAPTER 2 Data Modeling
CHAPTER 3 Conceptual Data Modeling
CHAPTER 4 The Normalization Process
CHAPTER 5 Implementing the Base Table Structures
CHAPTER 6 Protecting the Integrity of Your Data
CHAPTER 7 Securing Access to the Data
CHAPTER 8 Table Structures and Indexing
CHAPTER 9 Coding for Concurrency
CHAPTER 10 Code-Level Architectural Decisions
CHAPTER 11 Database Interoperability
APPENDIX A Codd’s 12 Rules for an RDBMS
APPENDIX B Datatype Reference
INDEX

Sample Chapter:
Ch. 02 - Data Modeling

Amazon Links: