Showing posts with label SQL Server vNext. Show all posts
Showing posts with label SQL Server vNext. Show all posts

Monday, February 20, 2017

Importing The Metropolitan Museum of Art's Open Access Initiative data into SQL Server

The Met just made some data available for download.

The Metropolitan Museum of Art presents over 5,000 years of art from around the world for everyone to experience and enjoy. The Museum lives in three iconic sites in New York City—The Met Fifth Avenue, The Met Breuer, and The Met Cloisters. Millions of people also take part in The Met experience online. 

Since it was founded in 1870, The Met has always aspired to be more than a treasury of rare and beautiful objects. Every day, art comes alive in the Museum's galleries and through its exhibitions and events, revealing both new ideas and unexpected connections across time and across cultures. The Metropolitan Museum of Art provides select datasets of information on more than 420,000 artworks in its Collection for unrestricted commercial and noncommercial use.


I decided to take a look. The first thing we will do is download the CSV file from their GitHub repository. You can find that here: https://github.com/metmuseum/openaccess

If you go to that link, you will see the following, grab the MetObjects.csv file

Be warned, this file is 214 MB.

One word of caution... if you try to import this file with a regular BULK INSERT command....good luck...let me know how many tries you need. No problem, I will just use a format file... and now you have 2 problems..  The real issue is that the file is somewhat problematic, there are quotes where there shouldn't be quotes,  there are no quotes where there should be quotes.  So what do you do?

Are you on SQL Server vNext 1.1 or higher? If you are, good news, you can use BULK INSERT and csv format, this is new in vNext 1.1


Ok let's get started, first create the following table

CREATE TABLE MetOpenData(
ObjectNumber nvarchar(4000),
IsHighlight nvarchar(4000),
IsPublicDomain nvarchar(4000),
ObjectID nvarchar(4000),
Department nvarchar(4000),
ObjectName nvarchar(4000),
Title nvarchar(4000),
Culture nvarchar(4000),
Period nvarchar(4000),
Dynasty nvarchar(4000),
Reign nvarchar(4000),
Portfolio nvarchar(4000),
ArtistRole nvarchar(4000),
ArtistPrefix nvarchar(4000),
ArtistDisplayName nvarchar(4000),
ArtistDisplayBio nvarchar(4000),
ArtistSuffix nvarchar(4000),
ArtistAlphaSort nvarchar(4000),
ArtistNationality nvarchar(4000),
ArtistBeginDate nvarchar(4000),
ArtistEndDate nvarchar(4000),
ObjectDate nvarchar(4000),
ObjectBeginDate nvarchar(4000),
ObjectEndDate nvarchar(4000),
Medium nvarchar(4000),
Dimensions nvarchar(4000),
CreditLine nvarchar(4000),
GeographyType nvarchar(4000),
City nvarchar(4000),
State nvarchar(4000),
County nvarchar(4000),
Country nvarchar(4000),
Region nvarchar(4000),
Subregion nvarchar(4000),
Locale nvarchar(4000),
Locus nvarchar(4000),
Excavation nvarchar(4000),
River nvarchar(4000),
Classification nvarchar(4000),
RightsandReproduction nvarchar(4000),
LinkResource nvarchar(4000),
MetadataDate nvarchar(4000),
Repository nvarchar(4000))

GO

Now it is time to import the file
Just to let you know, you will get a couple of errors, however all data except for these 4 rows will be imported

Msg 4864, Level 16, State 1, Line 62
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 213266, column 25 (Medium).
Msg 4864, Level 16, State 1, Line 62
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 217661, column 25 (Medium).
Msg 4863, Level 16, State 1, Line 62
Bulk load data conversion error (truncation) for row 226222, column 16 (ArtistDisplayBio).
Msg 4863, Level 16, State 1, Line 62
Bulk load data conversion error (truncation) for row 258639, column 16 (ArtistDisplayBio).

Here is what the BULK INSERT with FORMAT= 'CSV' command looks like. Change the filepath to point to the location where you have the MetObjects.csv file saved

BULK INSERT MetOpenData
FROM 'c:\Data\MetObjects.csv'
WITH (FORMAT = 'CSV'); 

Let's do a quick count


SELECT COUNT(*) FROM MetOpenData

The file that I imported resulted in 446026 rows. I downloaded this file on 2/10/2017, your file might have more data if they updated the file after the date I downloaded it


Now that we have the data we need, we can run some queries.
Let's see what kind of objects are in the collection, we will grab the top 15 objects

 SELECT TOP 15  ObjectName,count(*)
 FROM MetOpenData
 GROUP BY ObjectName
 ORDER BY 2 DESC

Here is what the results looks like


ObjectNameCount
Print 88582
Photograph 28071
Drawing 24905
Book 13360
Fragment 9405
Piece 8638
Negative 6258
Painting 5862
Baseball card, print 4985
Bowl 3534
Figure 3081
Baseball card 3046
Polaroid 2706
Vase 2698
Dress 2473

I don't know why..but I somehow thought painting would be the most occuring object..but what do I know

You can also treat this table as you own museum catalog, let's say you want to look at van Gogh's Madame Roulin and Her Baby painting?  No problem, run this query

 SELECT * FROM MetOpenData
 WHERE ArtistDisplayName like'%van%gogh%'
 and title = 'Madame Roulin and Her Baby'

Scroll to the LinkResource column, you will see the following: http://www.metmuseum.org/art/collection/search/459123

Clicking on that link will give you the following


Now you can download this image and do something with it, it is after all in the public domain

Here are a couple of more queries you can play around with



 
 SELECT city, count(*)
 FROM MetOpenData
 GROUP BY city
 ORDER BY 2 DESC

 SELECT Dynasty, count(*)
 FROM MetOpenData
 GROUP BY Dynasty
 ORDER BY 2 DESC

 SELECT Period, count(*)
 FROM MetOpenData
 GROUP BY Period
 ORDER BY 2 DESC

 SELECT ArtistNationality, count(*)
 FROM MetOpenData
 GROUP BY ArtistNationality
 ORDER BY 2 DESC


 SELECT * FROM MetOpenData
 WHERE ArtistDisplayName like'%pablo picasso%'

 SELECT * FROM MetOpenData
 WHERE ArtistDisplayName like'%rembrandt van rijn%'
 
 SELECT * FROM MetOpenData
 WHERE ObjectName like'%Postage stamps%'

I am not a big art person, but if you are and you have some interesting queries that you ran against this data please let me know in the comments

Also if you manage to get this file to import with plain old BCP or BULK INSERT with or without a format file...let me know the magic you used.... :-)


Saturday, January 07, 2017

BULK INSERT and csv format, new in vNext 1.1



In SQL Server vNext 1.1 we now have the ability to import a csv via the BULK INSERT command without having to specify the field or row terminator

 You still need to specify the format, if you only do something like the following

BULK INSERT AlexaSongs  
   FROM 'c:\Songs played with Alexa.csv'  

You will be greeted with these errors


Msg 4832, Level 16, State 1, Line 10
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 10
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 10
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

So let's take a look at how this all works

First create the following table


USE tempdb
GO

CREATE TABLE AlexaSongs(PlayDate varchar(100), 
   SongName varchar(200), 
   Artist varchar(200), 
   Album varchar(200))
GO

Now grab the csv file from here Songs played with Alexa.csv  Either download the whole project and grab the file, or open in raw mode and copy and paste it into a file and save as Songs played with Alexa.csv

Now that you have the file and table ready, let's first take a look at how this was done before vNext 1.1

Here is what it looked like


BULK INSERT AlexaSongs  
   FROM 'c:\Songs played with Alexa.csv'  
   WITH  
     (  
        FIELDTERMINATOR =',',  
        ROWTERMINATOR = '\n'
      ); 

As you can see, we specified a comma as the field terminator and a newline as the row terminator

You could also get it to work by just specifying the field terminator in this case

BULK INSERT AlexaSongs  
   FROM 'c:\Songs played with Alexa.csv'  
   WITH  
     (  
        FIELDTERMINATOR =','
      ); 


So what does the new syntax look like?

Here is the code that accomplished the same as above but by using the new WITH FORMAT = CSV option

BULK INSERT AlexaSongs
FROM 'c:\Songs played with Alexa.csv' 
WITH (FORMAT = 'CSV'); 


I guess you could say it is a little cleaner, but all this really is is syntactic sugar

For Azure, it looks like this, I grabbed this straight from this Books On Line Page here

First you need to create a data source


CREATE EXTERNAL DATA SOURCE MyAzureInvoices
    WITH  (
        TYPE = BLOB_STORAGE,
        LOCATION = 'https://newinvoices.blob.core.windows.net', 
        CREDENTIAL = UploadInvoices  
    );

And then you use that data source

BULK INSERT Colors2
FROM 'week3/inv-2017-01-19.csv'
WITH (DATA_SOURCE = 'MyAzureInvoices',
      FORMAT = 'CSV'); 

For more examples including accessing data in a CSV file referencing a container in an Azure blob storage location go here https://msdn.microsoft.com/en-us/library/mt805207.aspx

That's all for today

Sunday, November 27, 2016

Changing the SQL Server port on Linux


Today, I decided to see how I can change the port SQL Server is listening on



To see what port SQL Server is listening on, you can use the netstat command, here is what the output looks like when I run netstat -tulpn | grep sqlservr


I used grep to limit the output to lines that contain sqlservr only

Next, to change the port that SQL Server is listening on to 3000, I ran the following command

sudo /opt/mssql/bin/mssql-conf set tcpport 3000


I restarted SQL Server bu running the command systemctl restart mssql-server
Then I ran  netstat -tulpn | grep sqlservr again and as you can see the output now has 3000 as the port number

Now I tried using sqlcmd, I ran the following sqlcmd -S localhost -U SA


I got an timeout error

Finally I ran the sqlcmd command again but I added port number 3000, like this
sqlcmd -S localhost,3000 -U SA


As you can see, this worked and I could connect to SQL Server listening on port 3000 when supplying that to the sqlcmd command


How do we reset SQL Server back to the default port?
To reset SQL Server back to use the default port, run the command below

/opt/mssql/bin/mssql-conf unset tcpport

Here is a screenshot of running netstat, resetting the port, restarting SQL Server and running netstat again





You want to learn a little more?


We used tulpn as options in our netstat command. So what do all these options do?

[-t] 
tcp

[-u] 
udp

[-l]
Show only listening sockets

[-p]
Show the PID (process identifier) and name of the program to which each socket belongs.

[-n] 
Show numerical addresses instead of trying to determine symbolic host, port or user names.



On a Linux system to get help for a command, use man, this will bring up the user manual for that command

So if you run man netstat, you will see the following (I only pasted 1 page here, there are more)

man netstat

--------------------------

NETSTAT(8)                 Linux Programmer's Manual                NETSTAT(8)

NAME
       netstat  - Print network connections, routing tables, interface statis‐
       tics, masquerade connections, and multicast memberships

SYNOPSIS
       netstat  [address_family_options]  [--tcp|-t]   [--udp|-u]   [--raw|-w]
       [--listening|-l]     [--all|-a]     [--numeric|-n]    [--numeric-hosts]
       [--numeric-ports]           [--numeric-users]           [--symbolic|-N]
       [--extend|-e[--extend|-e]]  [--timers|-o] [--program|-p] [--verbose|-v]
       [--continuous|-c]

       netstat              {--route|-r}              [address_family_options]
       [--extend|-e[--extend|-e]]         [--verbose|-v]        [--numeric|-n]
       [--numeric-hosts] [--numeric-ports] [--numeric-users] [--continuous|-c]

       netstat {--interfaces|-i} [--all|-a] [--extend|-e[--extend|-e]] [--ver‐
       bose|-v]  [--program|-p]  [--numeric|-n]  [--numeric-hosts] [--numeric-
       ports] [--numeric-users] [--continuous|-c]

       netstat      {--groups|-g}       [--numeric|-n]       [--numeric-hosts]
       [--numeric-ports] [--numeric-users] [--continuous|-c]
 Manual page netstat(8) line 1/348 15% (press h for help or q to quit)


To see all my SQL Server on Linux posts, click here: SQL Server on Linux

Saturday, November 26, 2016

You snooze you lose... I lost....



I was working on a post about the new columns sql_memory_model, sql_memory_model_desc columns in DMV sys.dm_os_sys_info. These columns allow DBAs to programmatically identify if Lock Pages in Memory (LPIM) privilege is in effect at the service startup time.

I was setting it up, had one instance running with pages locked in memory and the other one without. Took some screenshots, then I noticed the MSSQL Tiger Team published the post
Developers choice: Programmatically identify LPIM and IFI privileges in SQL Server

So really there was no need for my post anymore. But what do I do with my beautiful screenshots?

Well not someone who likes to waste stuff, I decided to write just a couple of sentences and put the screenshots in this post and then link to the post by the MSSQL Tiger Team


Here is the query I ran to see instant file initialization is enabled, I ran this against two instances, one a SQL Server 2016 Service Pack instance, the other a SQL Server vNext instance

SELECT instant_file_initialization_enabled,* 
FROM sys.dm_server_services

You can see the output in the pic below, the vNext instance does not have instant file initialization enabled, the 2016 SP1 instance does have instant file initialization enabled



To see if Lock Pages in Memory is in use, you can run the following query


SELECT sql_memory_model,sql_memory_model_desc
FROM sys.dm_os_sys_info


You can see the output in the pic below, the vNext instance does not use Lock Pages in Memory, the 2016 SP1 instance does use Lock Pages in Memory


The sys.dm_os_sys_info DMV also has a new column that SQL Server 2016 SP1 does not have.. process_physical_affinity


To get all the details about this stuff in the pictures go read the post
Developers choice: Programmatically identify LPIM and IFI privileges in SQL Server published by the MSSQL Tiger Team


Sunday, November 20, 2016

Installing SQL Server on Linux, connecting and running queries from Linux


I decided to play around with installing SQL Server vNext on Linux today. I decided to pick the Ubuntu distribution. I created a VM in VirtualBox and then installed Ubuntu 16.04.   After that was done, it was time to install SQL Server.

All this stuff on this page runs on Linux, there are no windows components, if you have just a Linux desktop/server, you are all set


Here are the steps


Import the public repository GPG keys:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -


Register the Microsoft SQL Server Ubuntu repository:

curl https://packages.microsoft.com/config/ubuntu/16.04/mssql-server.list | sudo tee /etc/apt/sources.list.d/mssql-server.list


Run the following commands to install SQL Server:

sudo apt-get update
sudo apt-get install -y mssql-server



After the package installation finishes, run the configuration script and follow the prompts.


sudo /opt/mssql/bin/sqlservr-setup


Once the configuration is done, verify that the service is running:

systemctl status mssql-server

I myself like to run ps and then grep for mssql

ps -ef | grep  mssql



Ok so SQL Server is installed. In order to be able to run queries against SQL Server, we need to install the SQL Server tools

Open a new terminal window

Import the public repository GPG keys:


curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -


Register the Microsoft Ubuntu repository:

curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list


Update the sources list and run the installation command:


sudo apt-get update 
sudo apt-get install mssql-tools


You will get a bunch of prompts, just follow it, say YES or y and you are set

Now let's run a command

Run this from a terminal window

sqlcmd -S localhost -U SA
Enter you password
Enter a simple query like SELECT @@version
Type GO, hit enter and voila, you should get results

Here is what it looks like



Here is also a video of the whole process, all of this was done in less than 5 minutes


Here is also a link to the official documentation

Install SQL Server: https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-ubuntu
Install the tools: https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-tools#ubuntu


Visual Studio Code

Now unless you want to spend the rest of your life in a command window, I suggest you install VS Code

Visit this page https://code.visualstudio.com/#alt-downloads



Now for me .deb didn't work, so I got the tarball (.tar.gz) and manually extracted it

Then I ran the Code application inside the VSCode-linux-x64 folder

The first thing we have to do is install the mssql extension

Click on View-->Extensions, type in mssql, click on the install button

Now open a new file and save it with a .sql extension

Click on View-->Command Palette


Type sqlman


Add the server localhost), db name, username (probably sa) and add the password, save this with a name so you will know what it is


Type some valid SQL, hit CTRL + SHIFT + E or right click and select execute query


You should see something like this


There you have it, SQL Server running on Ubuntu, VS Code running on Ubuntu returning data from the SQL Server instance


I did hit an issue with trying to run xp_readerrorlog, you can read about that here:
How to read the errorlog on SQL Server installed on Linux

If you want to install SQL Server on Docker on a Mac, take a look at Aaron Bertrand's post
VS Code on Mac meets SQL Server on Linux (in Docker)


To see all my SQL Server on Linux posts, click here: SQL Server on Linux

How to read the errorlog on SQL Server installed on Linux



I finally installed SQL Server on Linux, while I was messing around with some queries, I noticed the following command fails with an error

exec xp_readerrorlog


The error is

Started executing query at Line 1
Msg 22004, Level 16, State 1, Line 0 Failed to open loopback connection. Please see event log for more information.
Msg 22004, Level 16, State 1, Line 0 Error log location not found.
Total execution time: 00:01:01.34


So how do you then look at the log on Linux? I don't have SSMS installed on the host machine where the Linux Virtual Machine lives. SQLCMD gives the same error. I don't know if it even works from SSMS from a windows box.

Edit: so it looks like it works from SSMS

To answer this question, the errorlog is stored in this location  /var/opt/mssql/log. You need to be a superuser to access this location

So in my case, I can just execute the following from a terminal in Ubuntu

sudo cat /var/opt/mssql/log/errorlog

And it looks like this


There you have it, this is how you can look at the error log

To see all my SQL Server on Linux posts, click here: SQL Server on Linux

Saturday, November 19, 2016

What we learned from the reddit SQL Server AMA thread on /r/sqlserver



The SQL Server AMA thread on reddit, has finished. I have put 8 most interesting answers here and then gave my take on it, the questions will be linked straight to the question on reddit. The answer will be in italics and my take on it will be below in regular font


What was the reasoning for having SQL Server for Linux run atop Drawbridge, rather than directly interfacing with Linux? It seams like it's not a true Linux port if NT Kernel emulation is required 

I guess you can always debate whether it is better to have an abstraction layer or do a full port. I am a practical person, and like to judge based on results. I myself am not on the Linux team, but am really impressed with what the team has managed to deliver thus far, both in terms of feature set and in terms of performance. A full port would have taken much longer and would have led to a code base that is much harder to maintain. 

I never heard of Drawbridge either. From the Microsoft Drawbridge research page: Drawbridge combines two core technologies: First, a picoprocess, which is a process-based isolation container with a minimal kernel API surface. Second, a library OS, which is a version of Windows enlightened to run efficiently within a picoprocess.



I really like the JSON support you guys added. Is JSON in SQL Server still evolving, or are you guys happy with where it is at?I would love to see a JSON data type, like XML support. Thanks!

We are collecting feedback and we will decide what would be the next step. Current NVARCHAR representation enables us to integrated JSON with in-memory, column store, does not require new types in drivers, etc. Also, some experiments show that text representation is even faster than XML type in some scenarios. The next step would be better integration with native modules that will speed up queries. We would like to know how your are using current JSON, what are limitations, and then we will decide what would be the next step.

It makes sense that they decided to use nvarchar since you can then use it with in-memory OLTP and Columnstore. We currently store json in some of our tables but we never search on it or manipulate it in the DB


How does making SQL Server available for Linux relate (or not relate) to Microsoft's famous "Embrace, Extend, Extinguish" strategy?

Two things - Satya Nadella, cloud economics - changed a lot of the way we think about things here at Microsoft now. The way I like to think of this strategy of bringing SQL Server to Linux is that it maps to Microsoft's new mission statement - "Empower every person and every organization to achieve more". SQL Server on Linux is about bringing the really powerful capabilities of SQL Server to everybody, not just those that are using Windows/Windows Server and seeing what people can do with it. It's inspiring for example to think about what the scientific or academic communities (which are oftentimes on Linux) could now achieve with SQL Server. Let's call it Microsoft's new 'Embrace, Enable, and Empower' strategy. :)

Haha I knew this would be asked, great answer also. We all remember the Ballmer Linux is a cancer quote, how things have changed

I am so sorry, but I just had to add this image  :-)




We do not plan to port SSMS "as is" to Linux/macOS for 2 main reasons: 1) SSMS on Windows depends on the VS2015 Isolated shell which is not available on Linux/macOS and 2) a direct port of SSMS will not align with the design guidelines & information architecture of apps that natively run on Linux/macOS (e.g. buttons, menus, etc. are different, the window chrome is different, etc.)

That said, we are working on a new set of multi-platform SQL client tools that DBAs and database developers can use on Linux/macOS/Windows. The new 'mssql' extension for Visual Studio Code is our first attempt in this area and we are using it to prove out the multi-platform SQL tools stack "backend" we've created.

We don't have specific dates yet but stay tuned for more details in the coming months. Meanwhile, you can try the mssql extension for VS Code on Linux/macOS/Windows from here: https://aka.ms/mssql-marketplace


We plan to have a cross-platform tool that will be Linux native. We don't have an exact date set but it's in the works for 2017. In the meantime, you can connect to SQL Server on Linux through the Visual Studio Code mssql extension as well as SSMS.


So it looks like there will be a cross-platform tool, maybe something like Visual Studio Code, make it web based. As long as it is not based on Java and has that nasty purple shiny look and feel. But in reality, all the people I worked with that used *nix always used windows and then would use PuTTY or something similar to SSH into the box and just do everything from the command line.


will SQL on Linux have an Express edition? (I think it should, to give a better alternative to all those LAMP proponents - though admittedly I think a number of switching to postgres, which is at least better than mysql)


Yes, it will! Cost = free.

This is nice, also it seems that when you license SQL Server in the future, the OS doesn't matter, if you get a 4 core license, you can install SQL Server on Linux or Windows, your choice


Is there any plan to make SQL Server work on Windows Server Nano?

This is something that we are looking into. It would require creating a new appx-style package for deployment. That in itself would be a good thing actually that could make it easier to install SQL Server (like the package based install for SQL Server on Linux) but it would be a very large development effort. Spending dev effort on that would take away from other things we would like to do. It would also mean that we end up having two different methods of patching SQL Server (.msp and appx) which could further complicate servicing. On the other hand, WS Nano is really well suited for SQL Server due to it's higher availability (less patching), smaller attack surface area, fewer services running, etc. Those are the trade offs we are thinking through, but no concrete plans one way or another right now. Let us know what you think we should do!

Installing SQL Server on Linux is really nice, here is what it looks like on redhat

sudo su
curl https://packages.microsoft.com/config/rhel/7/mssql-server.repo > /etc/yum.repos.d/mssql-server.repo
sudo yum install -y mssql-server
sudo /opt/mssql/bin/sqlservr-setup   #follow the prompts
systemctl status mssql-server #verify service is running

It also makes sense to run it on a server that is bare metal, no need for IE or printer service running.


Any chance we will ever get a step by step course that uses the developer edition of SQL server to walk someone through basic use cases of all the applications. For instance, creating a sample database, adding tables, keys. Scripting user permisions for them and encrypting them. Setting up the report server. Making a report. Building a cube, making a small data analysis setup. All in an easy to follow step by step? A basic use case example for r services.

Great minds think alike! We actually just this Wednesday posted our first set of developer tutorials for lots of different languages (node, python, C#, Java, and more) and platforms (macOS, Docker, Linux, and Windows). They cover how to get SQL Server installed, get some tools, get your first basic app going, and how to use some powerful features like columnstore. There is a tutorial for R language/services too. If that seems to be a popular thing, we'll keep going and add more to it. Check it out and spread the word! http://aka.ms/sqldev

You can find R Services samples in the following links:
https://gallery.cortanaintelligence.com/Collection/ML-Templates-with-SQL-Server-R-Services-1

https://msdn.microsoft.com/en-US/library/mt591993.aspx

Additionally, we have a new website focused around easy to use "Getting Started" tutorials. There is one R sample there:
https://www.microsoft.com/en-us/sql-server/developer-get-started/r


let me add a couple more R-related articles :)

https://github.com/Microsoft/sql-server-samples/tree/master/samples/features/r-services/Implementing%20Predictive%20Analytics

and a simple blogpost I wrote some time ago just in case it helps with the 101 https://medium.com/@davidsb/datascience-for-developers-build-your-first-predictive-model-with-r-a798f684752f



Is using the GPU for extra processing power on the horizon?

The trick with GPUs is balancing the cost of moving data onto the GPU vs the compute gains. While I can't comment on the future, we are always looking at ways to improve the performance of SQL Server and leverage the latest hardware to its fullest potential. I'd encourage you to take a look at the "SQL 2016 - It Just Runs Faster" blog series for examples of what the team has done recently.

Here are a few that I like:

  • Improvements in columnstore indexes and batch processing (link)
  • Automatic soft NUMA for large CPU system deployments (link)
  • Hardware acceleration for encryption features like TDE by leveraging Intel's AES-NI instruction set (link)


We evaluate hardware advancements periodically for SQL Server. The new Microsoft ML library that is available with SQL Server vNext uses GPU for the neural net algorithms so you may see such capabilities in other areas of the product.

This is definitely an area we are looking at. But we don't have any concrete at the moment.


I remember when postgres did this, the performance improvement was impressive, you can read about that here PG-STROM



That is all, if you want to see all the questions and answers go here:  SQL Server AMA thread on reddit,

Wednesday, November 16, 2016

Playing around with the new stuff in SQL Server vNext and SQL Server 2016 SP1


Earlier today Microsoft released the first CTP of the next version of SQL Server, I have a post about that here SQL Server 2016 SP1 released, SQL Server vNext available for download

I downloaded and installed SQL Server on a VM (But only after the VM decided to install Windows 10 anniversary update first). I played around with some of the new things, let's take a look

Version
If you select @@VERSION, you get the following back

Microsoft SQL Server vNext (CTP1) - 14.0.1.246 (X64)   Nov  1 2016 23:24:39   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows 10 Pro 6.3 <X64> (Build 14393: ) (Hypervisor)

The compatibility for this version of SQL Server is 140

STRING_AGG

This is a brand new string function in SQL Server, If you have used MySQL then the STRING_AGG is similar to the GROUP_CONCAT function, however you can't use DISTINCT like you can in MySQL

The function STRING_AGG concatenates the values of string expressions and places separator values between them. The separator is not added at the end of string. This function is available beginning with SQL Server 2016 SP1

Here is a sample query

SELECT STRING_AGG (name, ',') AS csv 
FROM master..spt_values
WHERE type = 'A'; 

Here is the output

rpc,pub,sub,dist,dpub,rpc out,data access,collation compatible,system,use remote collation,lazy schema validation,remote proc transaction promotion

Nothing special of course but at least you don't have to write your own version and thinking about removing the last comma

A better example would be if you wanted to get all the columns for a table next to the table name
You would think the query would be like this

SELECT t.name,STRING_AGG (c.name, ',') AS csv
FROM sys.tables t
JOIN sys.columns c on t.object_id = c.object_id
ORDER BY 1

However that gives you the following error

Msg 8120, Level 16, State 1, Line 41
Column 'sys.tables.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

You need to add a GROUP BY to the query

SELECT t.name,STRING_AGG (c.name, ',') AS csv
FROM sys.tables t
JOIN sys.columns c on t.object_id = c.object_id
GROUP BY t.name
ORDER BY 1


Here is what you get back


That looks much better, I can see myself using this a lot in the future


In memory OLTP/Columnstore


According to the documentation sp_spaceused is now supported for in-memory tables.
I tried it out and with SQL Server 2016 I get back 0 rows and 0 KB, with vNext I get back rows but NULL for everything else


  name rows reserved data index_size unused
ShoppingCart 0                   0 KB 0 KB 0 KB 0 KB  -- 2016
ShoppingCart 53248              NULL NULL NULL NULL -- vNext

Not sure what is going on there.

You can now add a columnstore index to a table that has a varchar(max) data type in vNext


CREATE TABLE t_bulkload (
accountkey int not null,
accountdescription varchar (max),
accounttype char(500),
AccountCodeAlternatekey int)
GO

CREATE CLUSTERED COLUMNSTORE index t_bulkload_cci on t_bulkload

Running that on SQL Server 2016 gives the following error

Msg 35343, Level 16, State 1, Line 39
The statement failed. Column 'accountdescription' has a data type that cannot participate in a columnstore index. Omit column 'accountdescription'.

Running that same code on SQL Server vNext 2016 gives the following warning

Warning: Using Lob types (NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX)) with Clustered Columnstore Index is in public preview. Do not use it on production data without backup during public preview period.


Here are some more new things in regard to in memory tables

  • sp_spaceused is now supported for in-memory tables.
  • sp_rename is now supported for native modules.
  • CASE statements are now supported for native modules.
  • The limitation of 8 indexes on in-memory tables has been removed.
  • TOP (N) WITH TIES is now supported in native modules.
  • ALTER TABLE against in-memory tables is now substantially faster in some cases.
  • Transaction redo In-memory tables is now done in parallel. This substantially reduces the time to do failovers or in some cases restarts.
  • In-memory checkpoint files can now be stored on Azure Storage. This provides equivalent capabilities to MDF compared to LDF files, which already have this capability.
As a final note on in memory OLTP, I must say that I ran everything in SQL Server 2016 and also in SQL Server vNext, running the stuff on the vNext instance seemed faster to me.

sys.dm_os_host_info

This is a new OS related system DMV and it returns one row that displays operating system version information.


SELECT * FROM sys.dm_os_host_info;  


Here is the output
host_platformhost_distributionhost_releasehost_service_pack_levelhost_skuos_language_version
WindowsWindows 10
Pro
6.3481033

Here is a sample result set on Linux:
host_platformhost_distributionhost_releasehost_service_pack_levelhost_skuos_language_version
LinuxUbuntu16.04NULL1033


Here is what it looks like in SSMS



That is all for now..... you can find all the new stuff on MSDN https://msdn.microsoft.com/en-US/library/mt788653(SQL.130).aspx

SQL Server 2016 SP1 released, SQL Server vNext available for download


Today Microsoft announced the CTP of the next version of SQL Server, you can download it here https://www.microsoft.com/evalcenter/evaluate-sql-server-vnext-ctp

As you can see I am already downloading this version



What's New in SQL Server vNext https://msdn.microsoft.com/en-US/library/mt788653(SQL.130).aspx

Install SQL Server on Linux https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup

SQL Server on Linux Documentation https://docs.microsoft.com/en-us/sql/linux/


SQL Server Service Pack 1

Also announced was Service Pack 1 of SQL Server 2016, you can download that here https://go.microsoft.com/fwlink/?linkid=835368

There are so many cool things in this service pack. For one,  all the editions now support all the programmability features like indexed views, columnstore indexes, partitioning etc etc, see image below


You are still bound by the memory and CPU limits but at least your code doesn't have to change, this is good news for ISVs.

A couple of more tidbits....

  • CREATE OR ALTER (Yes, we heard you !!!) – New CREATE OR ALTER support makes it easier to modify and deploy objects like Stored Procedures, Triggers, User–Defined Functions, and Views. This was one of the highly requested features by developers and SQL Community.

  • New DMF sys.dm_exec_query_statistics_xml – Use this DMF to obtain actual query execution showplan XML (with actual number of rows) for a query which is still being executed in a given session (session id as input parameter). The showplan with a snapshot of current execution statistics is returned when profiling infrastructure (legacy or lightweight) is on.
  • New DMF for incremental statistics – New DMF sys.dm_db_incremental_stats_properties to expose information per–partition for incremental stats.
  • Better correlation between diagnostics XE and DMVs – Query_hash and query_plan_hash are used for identifying a query uniquely. DMV defines them as varbinary(8), while XEvent defines them as UINT64. Since SQL server does not have “unsigned bigint”, casting does not always work. This improvement introduces new XEvent action/filter columns equivalent to query_hash and query_plan_hash except they are defined as INT64 which can help correlating queries between XE and DMVs.

There is much more, see all the news here: https://blogs.msdn.microsoft.com/sqlreleaseservices/sql-server-2016-service-pack-1-sp1-released/


I also played around with some of the new stuff after installing vNext, see here:
Playing around with the new stuff in SQL Server vNext and SQL Server 2016 SP1