Showing posts with label analytical functions. Show all posts
Showing posts with label analytical functions. Show all posts

Friday, March 04, 2016

Gotcha with the LEAD analytical function and date datatype

The other day some code that used the LEAD analytical function stopped working suddenly. It started to throw an error.

Msg 206, Level 16, State 2, Line 1
Operand type clash: date is incompatible with int

The code wasn't touched in a long time... so what could be the issue. Let's reproduce it here


First create the following table


CREATE TABLE testRunningTotal(Id tinyint,SomeDate datetime, Charge decimal(20,2))

INSERT testRunningTotal
SELECT 1,'20120101',1000
UNION ALL
SELECT 1,'20120401',200
UNION ALL
SELECT 1,'20120501',300
UNION ALL
SELECT 1,'20120601',600
UNION ALL
SELECT 2,'20120101',100
UNION ALL
SELECT 2,'20130101',500
UNION ALL
SELECT 2,'20140101',-800
UNION ALL
SELECT 3,'20120101',100

Running this query to see what is in this table


SELECT * FROM testRunningTotal

Id SomeDate Charge
1 2012-01-01 00:00:00.000 1000.00
1 2012-04-01 00:00:00.000 200.00
1 2012-05-01 00:00:00.000 300.00
1 2012-06-01 00:00:00.000 600.00
2 2012-01-01 00:00:00.000 100.00
2 2013-01-01 00:00:00.000 500.00
2 2014-01-01 00:00:00.000 -800.00
3 2012-01-01 00:00:00.000 100.00


Now run our fancy query...


SELECT id, someDate as StartDate,
LEAD(SomeDate -1,1,'99991231') 
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM testRunningTotal
  ORDER BY id, SomeDate


Here are our results

id StartDate Enddate Charge RunningTotal
1 2012-01-01 00:00:00.000 2012-03-31 00:00:00.000 1000.00 1000.00
1 2012-04-01 00:00:00.000 2012-04-30 00:00:00.000 200.00 1200.00
1 2012-05-01 00:00:00.000 2012-05-31 00:00:00.000 300.00 1500.00
1 2012-06-01 00:00:00.000 9999-12-31 00:00:00.000 600.00 2100.00
2 2012-01-01 00:00:00.000 2012-12-31 00:00:00.000 100.00 100.00
2 2013-01-01 00:00:00.000 2013-12-31 00:00:00.000 500.00 600.00
2 2014-01-01 00:00:00.000 9999-12-31 00:00:00.000 -800.00 -200.00
3 2012-01-01 00:00:00.000 9999-12-31 00:00:00.000 100.00 100.00


To see what the query actually does, visit this post Easy running totals with windowing functions

Let's take a closer look at the LEAD analytical function, this is what Books On Line has on its page

  LEAD ( scalar_expression [ ,offset ] , [ default ] ) 
    OVER ( [ partition_by_clause ] order_by_clause )

offset
The number of rows forward from the current row from which to obtain a value. If not specified, the default is 1. offset can be a column, subquery, or other expression that evaluates to a positive integer or can be implicitly converted to bigint. offset cannot be a negative value or an analytic function.

Looks all cool to me..nothing that stands out that could cause a problem...



What will happen if someone changes the column from a datetime to a date? Maybe they looked at the data and saw that all the values had only dates and no time components... why use 8 bytes of storage when you can use 3 instead?



To find out exactly what will happen, run the following DDL statement....


ALTER TABLE testRunningTotal ALTER COLUMN SomeDate date


Now go run your query again


SELECT id, someDate as StartDate,
LEAD(SomeDate -1,1,'99991231') 
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM testRunningTotal
  ORDER BY id, SomeDate

Here is the error you get

  Msg 206, Level 16, State 2, Line 1
Operand type clash: date is incompatible with int


The error is the same as if you would run the following


SELECT CONVERT(DATE,GETDATE() )-1

Instead what you have to do is use the DATEADD function to do arithmetic with dates


SELECT DATEADD(dd,-1,CONVERT(DATE,GETDATE()))


So to fix our query from before, we want to change the -1 to be DATEADD(dd,-1 instead, here is what our query looks like


 SELECT id, someDate as StartDate,
LEAD(dateadd(dd,-1,SomeDate),1,'99991231') 
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM testRunningTotal
  ORDER BY id, SomeDate

And here are our results, it's all good now

id StartDate Enddate Charge RunningTotal
1 2012-01-01 2012-03-31 1000.00 1000.00
1 2012-04-01 2012-04-30 200.00 1200.00
1 2012-05-01 2012-05-31 300.00 1500.00
1 2012-06-01 9999-12-31 600.00 2100.00
2 2012-01-01 2012-12-31 100.00 100.00
2 2013-01-01 2013-12-31 500.00 600.00
2 2014-01-01 9999-12-31 -800.00 -200.00
3 2012-01-01 9999-12-31 100.00 100.00


I still think that maybe the LEAD function should do this conversion internally since all we are doing is specifying the OFFSET...what is your opinion?