SQL Server Code,Tips and Tricks, Performance Tuning: Howto

Showing posts with label Howto. Show all posts

Saturday, November 26, 2016

BULK INSERT with a file containing linefeeds gotcha

I blogged about this before but someone ran into this issue again at work so I thought it made sense to revisit this.

I prefer to use BULK INSERT or bcp as much as possible, this is especially true if all I need to do is dump the file into a table. If there is a need to do complex stuff, I will use SQL Server Integration Services or Informatica. These days files are generated by all kinds of systems, these can be Windows, *nix, Mac, Raspberry Pis, IoT systems and many other systems.

If you get an error importing one of these files, your first course of action is to open one of these files in something like Notepad++ or Editplus. Notepad++ and Editplus have more functionality and are many times faster than notepad. The nice thing about either of these is that you can see control characters.

See how you can see the linefeeds here? You cannot do that in notepad

So let's say you get a file where the row terminator is a linefeed, how would you specify that as a row terminator in BULK INSERT?

You can try n which stands for newline

BULK INSERT SomeTable
 FROM 'D:\JunkdrawImportMe.txt'
 WITH (FIELDTERMINATOR = 't',
 FIRSTROW =2,
 ROWTERMINATOR = 'n')

Nope, that doesn't work, you get 0 rows inserted

You can try r which stands for carriage return

BULK INSERT SomeTable
 FROM 'D:\JunkdrawImportMe.txt'
 WITH (FIELDTERMINATOR = 't',
 FIRSTROW =2,
 ROWTERMINATOR = 'r')

Nope, that doesn't work either, you get 0 rows inserted

What about l for linefeed?

BULK INSERT SomeTable
 FROM 'D:\JunkdrawImportMe.txt'
 WITH (FIELDTERMINATOR = 't',
 FIRSTROW =2,
 ROWTERMINATOR = 'l')

You get an error

Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (SomeDate).

What about if you try a CHAR(10) which is a linefeed

BULK INSERT SomeTable
 FROM 'D:\JunkdrawImportMe.txt'
 WITH (FIELDTERMINATOR = 't',
 FIRSTROW =2,
 ROWTERMINATOR = CHAR(10) )

You get this error

Msg 102, Level 15, State 1, Line 5
Incorrect syntax near 'CHAR'.

How about if you embed it by using Dynamic SQL

DECLARE @cmd varchar(1000)
SET @cmd = 'BULK INSERT SomeTable
FROM ''D:\JunkdrawImportMe.txt''
WITH ( FIELDTERMINATOR = ''t'',
 FIRSTROW =2,
 ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC(@cmd)

The dynamic SQL solution works, that was my original answer to this.

What about if we use the hex code for line feed 0x0a?

BULK INSERT SomeTable
FROM 'D:\JunkdrawImportMe.txt'
WITH (FIELDTERMINATOR = 't',
FIRSTROW =2,
ROWTERMINATOR = '0x0a') --CHAR(10) or ‘l’ does not work 
-- 0x0a is the hex code for linefeed CHAR(10)

Bingo, that works as well. I actually prefer this, who wants to muck around with Dynamic SQL if there is an easier way....

That's all, keep this in mind next time you get a file with a linefeed and you are struggling importing that file

Sunday, November 20, 2016

How to read the errorlog on SQL Server installed on Linux

I finally installed SQL Server on Linux, while I was messing around with some queries, I noticed the following command fails with an error

exec xp_readerrorlog

The error is

Started executing query at Line 1
Msg 22004, Level 16, State 1, Line 0 Failed to open loopback connection. Please see event log for more information.
Msg 22004, Level 16, State 1, Line 0 Error log location not found.
Total execution time: 00:01:01.34

So how do you then look at the log on Linux? I don't have SSMS installed on the host machine where the Linux Virtual Machine lives. SQLCMD gives the same error. I don't know if it even works from SSMS from a windows box.

Edit: so it looks like it works from SSMS

@DenisGobo SSMS reading SQL logs works. Also quick tested xp_readerrorlog 0, 1 - works fine as well.
— Andy Yun (@SQLBek) November 20, 2016

To answer this question, the errorlog is stored in this location /var/opt/mssql/log. You need to be a superuser to access this location

So in my case, I can just execute the following from a terminal in Ubuntu

sudo cat /var/opt/mssql/log/errorlog

And it looks like this

There you have it, this is how you can look at the error log

To see all my SQL Server on Linux posts, click here: SQL Server on Linux

Wednesday, March 02, 2016

Easy running totals with windowing functions

Back in the pre SQL Server 2012 days, if you wanted to do a running count, you either had to run a subquery or you could use a variable. This was slow because for each row the query that did the sum would be executed. With the additions to the windowing functions in SQL Server 2012, this is now running much faster.

Let's take a look, first create the following table

CREATE TABLE #test(Id tinyint,SomeDate date, Charge decimal(20,10))

insert #test
SELECT 1,'20120101',1000
UNION ALL
SELECT 1,'20120401',200
UNION ALL
SELECT 1,'20120501',300
UNION ALL
SELECT 1,'20120601',600
UNION ALL
SELECT 2,'20120101',100
UNION ALL
SELECT 2,'20130101',500
UNION ALL
SELECT 2,'20140101',-800
UNION ALL
SELECT 3,'20120101',100

let's check that data we just inserted into the temporary table

SELECT * FROM #test

The output looks like this

Id SomeDate Charge
1 2012-01-01 1000.0000000000
1 2012-04-01 200.0000000000
1 2012-05-01 300.0000000000
1 2012-06-01 600.0000000000
2 2012-01-01 100.0000000000
2 2013-01-01 500.0000000000
2 2014-01-01 -800.0000000000
3 2012-01-01 100.0000000000

What we want is the following

id StartDate Enddate         Charge         RunningTotal
1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000
1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000
1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000
1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000
2 2012-01-01 2012-12-31 100.0000000000 100.0000000000
2 2013-01-01 2013-12-31 500.0000000000 600.0000000000
2 2014-01-01 9999-12-31 -800.0000000000 -200.0000000000
3 2012-01-01 9999-12-31 100.0000000000 100.0000000000

For each row, we want to have the date that the row starts on and also the date when it end, we also want a running total as well. If there is no row after the current row for that id, we want the end date to be 9999-12-31.

So we will use a couple of functions. The first one is LEAD, LEAD accesses data from a subsequent row in the same result set without the use of a self-join. So the LEAD part looks like this

LEAD(dateadd(dd,-1,SomeDate),1,'99991231') OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,

What we are doing is subtracting 1 from the date in the subsequent row (ateadd(dd,-1,SomeDate))
We are using 1 as the offset since we want to apply this to the next row. Finally if there is no subsequent row, we want to use the date 9999-12-31 instead of NULL

To do the running count, we will do the following

SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal

What this means in English is for each id ordered by date, sum up the charge values for the rows between the preceding rows and the current row. Here is what all that stuff means.

ROWS BETWEEN
Specifies the rows that make up the range to use as implied by

UNBOUNDED PRECEDING
Specifies that the window starts at the first row of the partition. UNBOUNDED PRECEDING can only be specified as window starting point.

CURRENT ROW
Specifies that the window starts or ends at the current row when used with ROWS or the current value when used with RANGE.
CURRENT ROW can be specified as both a starting and ending point.

And here is the query

SELECT id, someDate as StartDate,
LEAD(dateadd(dd,-1,SomeDate),1,'99991231') 
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM #test
  ORDER BY id, SomeDate

And running that query, gives us the running count as well as the dates

id StartDate Enddate         Charge         RunningTotal
1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000
1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000
1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000
1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000
2 2012-01-01 2011-12-31 100.0000000000 100.0000000000
2 2012-01-01 2011-13-31 500.0000000000 600.0000000000
2 2012-01-01 9999-12-31 -800.0000000000 -200.0000000000
3 2012-01-01 9999-12-31 100.0000000000 100.0000000000

That's all for this post...

Tuesday, April 22, 2008

How to rename a column in a SQL Server table without using the designer

If you have a table and you want to rename a column without using the designer, how can you do that?

First create this table

CREATE TABLE TestColumnChange(id int)
INSERT TestColumnChange VALUES(1)

SELECT * FROM TestColumnChange

As you can see the select statement returns id as the column name. you can use ALTER table ALTER Column to change the dataype of a column but not the name.

Here is what we will do, execute the statement below

EXEC sp_rename 'TestColumnChange.[id]', 'value', 'COLUMN'

Now do the select, you will see that the column name has changed

SELECT * FROM TestColumnChange

That is it, very simple

Tuesday, July 31, 2007

Cannot resolve collation conflict for equal to operation.

You set up your linked server, you write a query which joins two tables, you execute the query and the error message is this
Cannot resolve collation conflict for equal to operation

What does this mean? This mean that the collation on the two tables is different

Let's look at an example. Le's create two tables, onme with Traditional_Spanish_CI_AI collation and one with the default. The default collation for me is SQL_Latin1_General_CP1_CI_AS.

CREATE TABLE #Foo (SomeCol varchar(50) COLLATE Traditional_Spanish_CI_AI)
CREATE TABLE #Foo2 (SomeCol varchar(50))

INSERT #Foo VALUES ('AAA')
INSERT #Foo VALUES ('BBB')
INSERT #Foo VALUES ('CCC')
INSERT #Foo VALUES ('DDD')

INSERT #Foo2 VALUES ('AAA')
INSERT #Foo2 VALUES ('BBB')
INSERT #Foo2 VALUES ('CCC')
INSERT #Foo2 VALUES ('DDD')

Now run this query and you will get the error message

SELECT * FROM #Foo F1
JOIN #Foo2 f2 ON f1.SomeCol = f2.SomeCol

Server: Msg 446, Level 16, State 9, Line 1
Cannot resolve collation conflict for equal to operation.

Now add COLLATE Traditional_Spanish_CI_AI to #Foo2 SomeCol

SELECT * FROM #Foo F1
JOIN #Foo2 f2 ON f1.SomeCol = f2.SomeCol COLLATE Traditional_Spanish_CI_AI

That works, if you add COLLATE SQL_Latin1_General_CP1_CI_AS to #Foo SomeCol that will work also

SELECT * FROM #Foo F1
JOIN #Foo2 f2 ON f1.SomeCol COLLATE SQL_Latin1_General_CP1_CI_AS = f2.SomeCol

If you want to know what these collations mean then run the following query (yes that is not a typo it is indeed ::).

SELECT *
FROM ::fn_helpcollations()
WHERE name in('SQL_Latin1_General_CP1_CI_AS','Traditional_Spanish_CI_AI')

Traditional_Spanish_CI_AI
Traditional-Spanish,
case-insensitive,
accent-insensitive,
kanatype-insensitive,
width-insensitive

SQL_Latin1_General_CP1_CI_AS
Latin1-General,
case-insensitive,
accent-sensitive,
kanatype-insensitive,
width-insensitive for Unicode Data,
SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

Monday, December 12, 2005

Fun With SQL Server Update Triggers

Below is some code that will show how to test for updated field values in an update trigger. As you can see the IF UPDATE (field) is true even when the values don’t change. Another thing to keep in mind is that if a value changes from NULL to something else and vice-versa, and you are comparing deleted and inserted tables without using COALESCE or ISNULL it won’t return those rows. Run the code below to see what I mean

CREATE TABLE TestTrigger (TestID INT identity,
name VARCHAR(20),
value DECIMAL(12,2) ,
CONSTRAINT chkPositiveValue CHECK (value > 0.00) )

INSERT INTO TestTrigger
SELECT 'SQL',500.23

CREATE TRIGGER trTest
ON TestTrigger
FOR UPDATE
AS

IF @@ROWCOUNT =0
RETURN

IF UPDATE(value)
BEGIN
SELECT '1', * FROM deleted d JOIN inserted i ON d.testid =i.testid
SELECT '2',* FROM deleted d JOIN inserted i ON d.testid =i.testid
AND i.value <> d.value
SELECT '3',* FROM deleted d JOIN inserted i ON d.testid =i.testid
AND COALESCE(i.value,-1) <> COALESCE(d.value,-1)
END
GO

--Let's update the value to 100
UPDATE TestTrigger SET value = 100 WHERE testid =1
--we get back all 3 rows

--Let's run the same statement
UPDATE TestTrigger SET value = 100 WHERE testid =1
--we get back the first row

--Let's really update
UPDATE TestTrigger SET value = 200 WHERE testid =1
--we get back all 3 rows

--Let's update with NULL
UPDATE TestTrigger SET value =NULL WHERE testid =1
--we get back rows 1 and 3, row 2 is not returned because it can't compare it

--Let's update with NULL again
UPDATE TestTrigger SET value =NULL WHERE testid =1
--we get back row 1

--Let's update with 300
UPDATE TestTrigger SET value =300 WHERE testid =1
--we get back rows 1 and 3, row 2 doesn't return because it can't compare NULL to 300

--Let's update with 500
UPDATE TestTrigger SET value =500 WHERE testid =1
--we get back all 3 rows