Showing posts with label rdbms. Show all posts
Showing posts with label rdbms. Show all posts

Sunday, August 17, 2008

Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code

Take a look at this query.

SELECT * FROM

  • (

  • SELECT customer_id, ‘MTD’ AS record_type, COUNT(*), SUM(), AVG()

  • FROM payment_table


  • WHERE YEAR(payment_dt) = YEAR(GETDATE())

  • and MONTH(payment_dt) = MONTH(GETDATE())

  • GROUP BY customer_id) MTD_payments

  • UNION ALL

  • (

  • SELECT customer_id, ‘YTD’ AS record_type, COUNT(*), SUM(), AVG()

  • FROM payment_table

  • WHERE

  • WHERE YEAR(payment_dt) = YEAR(GETDATE())

  • GROUP BY customer_id) YTD_payments

  • UNION ALL

  • (

  • SELECT customer_id, ‘LTD’ AS record_type, COUNT(*), SUM(), AVG()

  • FROM payment_table) LTD_payments

  • ) payments_report

  • ORDER BY customer_id, record_type



  • Can you see the problem?
    A person had this query, it would run for over 24 hours. Wow, that is pretty bad, I don't think I had ever written something that ran over an hour, and the ones I did were mostly defragmentation and update statistics jobs.

    The problem is that the following piece of code

    where year(payment_dt) = year(getDate())
    and month(payment_dt) = month(getDate())

    is not sargable. First what does it mean to be sargable? A query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able.

    This query is not sargable because there is a function on the column, whenever you use a function on the column you will not get an index seek but an index scan. The difference between an index seek and an index scan can be explained like this: when searching for something in a book, you go to the index in the back find the page number and go to the page, that is an index seek. When looking for something in a book you go from page one until the last page, read all the words on all the ages and get what you need, that was an index scan. Do you see how much more expensive in terms of performance that was?

    Let's get back to the query, what can we do to make this piece of code use an index seek?
    where year(payment_dt) = year(getDate())
    and month(payment_dt) = month(getDate())

    You would change it to this:
    where payment_dt >= dateadd(mm, datediff(mm, 0, getdate())+0, 0)
    and payment_dt < dateadd(mm, datediff(mm, 0, getdate())+1, 0)

    You can see the complete question on the MSDN forum site here:
    http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=3746751&SiteID=1
    The Person said that his query went from over 24 hours to 36 seconds. Wow!! That is very significant. hardware cannot help you out if you have bad queries like that.

    The same exact day I answered a very similar question, take a look here: http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=3752248&SiteID=1

    The person had this

    AND DATEDIFF(d, '08/10/2008', DateCreated) >= 0

    AND DATEDIFF(d, DateCreated, '08/15/2008') >= 0


    I told him to change it to this

    AND DateCreated >= '08/10/2008'
    and DateCreated <= '08/15/2008'


    And that solved that query. If you are interested in some more performance, I have written some Query Optimization items on the LessThanDot Wiki. Below are some direct links

    Case Sensitive Search
    No Functions on Left Side of Operator
    Query Optimizations With Dates
    Optimization: Set Nocount On
    No Math In Where Clause
    Don't Use (select *), but List Columns


    If you are interested in some blogposts about dates, take a look at these two which I wrote earlier
    How Are Dates Stored In SQL Server?
    Do You Know How Between Works With Dates?