Thursday, March 31, 2016

Microsoft releases SQL Server Horizontica, a Vertica killer



Amsterdam, Noord Holland, Nederlands, European Union. — April 1, 2016 — Microsoft Corp. on Friday morning announced that it has released the next version of its flagship Relational (and beyond) Database Management System SQL Server Horizontica.

Here is a quick rundown of all the new, exciting and shiny things......

Goes to eleven.
What we have done here is, if you select this option, we will offload very CPU intensive operations to the GPU. Think about it, GPUs are many times faster than CPUs… hookup a bunch of graphic card to you server and your CPU utilization goes down by 99%. Here is what an external GPU rig look like from one of our biggest beta testers



Mr Stein who is a member of that team will explain what he likes the most about Goes To Eleven.

Frank E. Stein

The best part about Goes To Eleven is that if the GPU used capacity is below 50%, SQL Server will automatically use the excess capacity to start bitcoin mining. This in turn will lower the cost of running SQL Server, it really is a no brainer. When you need capacity it is there, when you don't need it, you earn money. Being a former Oracle customer where the licensing model is "How much money do you have?" this is really welcome.

Some more new goodies

BSON
Every vendor has JSON by this point, we took it s step further and added Binary JSON also known as BSON support


MINDOP
Minimum degree of parallelism. Brand new in this version, you can now specify the minimum number of CPUs/GPUs that a query should use. If you know your query is going to return a lot of data...instead of adding indexes...think big and use a minimum of 128 CPUs/GPUs for your query

Rowstore
When we looked at Vertica and Sybase IQ customers we noticed a lot of queries executed were using SELECT *, this is very slow with a columnar data storage. With our patent pending rowstore, it is as fast as the traditional page store. The ROI is immense, no more do programmers and brogrammers have to find all the code where a table is used to add the additional column, SELECT * is always correct. Rowstore is also more natural, when getting the data back people read from left to right or right to left, people don't read from top to bottom, since the data is already stored in rows...no transform needs to happen, this also makes data display faster.

Variable page sizes
Instead of just 8K page sizes, SQL Server now also offers the following page sizes: 2K, 4K, 8K, 16K, 32K, 64K and 640K. The reason we have 640K is because that is the max page size anyone will ever need, you probably will not use it now, but you might in 10 years, we are already prepared for that

Smallishint
Smallishint is a 3 byte integer data type, bigger than tinyint and smaller than smallint. You can store values up to 4096

Myriaton
We took the concepts of Hekaton and got another 100x improvement

One of our customers will explaining how they are using it and how it has improved their applications

Dr A. Cula


I am Dr A. Cula and I work for Ichor, a company specializing in eternal life. We are running a big blood bank, we need to keep this stocked for our mmm inner circle members. We use tagging of subjects so that we can go and mmm get more blood when needed. We were always running into problems when summer came along..our mmm dispatchers and collectors don't like to work when it is sunny. So we needed really good performance during summer hours. After switching SQL Server Horizontica and using Myriaton we now are so efficient that we opened up 4 more blood banks. We could never have envisioned something like this was possible, because of this we also expanded our inner circle by 200%


Universal Stretch and Universal Availability Groups
This is really exciting, instead of a regular stretch database on Azure, Universal Stretch will copy your data on one of our data centers located in the universe. This year it is just on the Moon where we have a data center already, in 2 years you can also use the one we are currently sending to Mars. Then when we start sending people to Mars, you are already open for business...the data is already there and it will be synchronized between Earth, Moon and Mars. Universal Availability Groups work the same way, you plug in the planet or moon and we take care of the rest.


ChandarUma ChandranJaya (also known as CU) program manager on the SQL Server programmability team will fill us in on some additional new features not mentioned yet.

New JOINS

ANTI JOIN
This is a brand new JOIN and it is ANSI SQL 2022 compliant. It is basically a JOIN between 2 tables on a column where the values are not the same (Think like NOT EXISTS)


SELECT *
FROM Table1 t1
ANTI JOIN Table2 t2
ON t1.ID = T2


CUBE JOIN
The CUBE JOIN takes the CROSS JOIN to whole other level...instead of returning all rows in one table for each row in the other tables.... the CUBE JOIN returns all values in all columns in all rows in one table for each value in all columns in all rows in another table... it is like a hypercube...


SELECT * 
FROM Table1 t1
CUBE JOIN Table2 t2

NATURAL JOIN
The  NATURAL JOIN joins two tables on columns with the same name


SELECT * 
FROM Table1 t1
NATURAL JOIN Table2 t2

KEY JOIN
The KEY JOIN is really cool, what it does it joins tables based on primary and foreign keys, no need to specify these in the join condition anymore


SELECT * 
FROM Table1 t1
KEY JOIN Table2 t2


New Traceflags
Traceflag 666
This is a brand new traceflag, it will give your SQL Server instance hellish performance. It acts like a Double Distributed Denial Of Service (DDDOS) attack against you SQL Server instance...... IO, Memory.... everything will be completely saturated... however you can still use the DAC (Dedicated Administrative Connection) to get in and run queries against DMVs to check the Subsystem poerformance

New functions
We have implemented the following new functions
IsDaylightSavingsTime
IsNotWeekend
Trim
sp_MsForEachServer
DBCC CheckCrappyCode
DBCC
Quartile
DCOUNT
DAYS360
IsOdd
IsNontext
Gigo
Lifo
Fifo
Fido

Specialized indexes
We now have the following new indexes

DateIndex  designed to work with all the date data types

FunctionIndex, finally you can index for example the first 5 characters of a column only, you can use all the functions available in SQL Server

Swift
Swift is a brand new storage engine, you can use this when you don't need constraints, keys..after all real developers manage those in the app tier. You enable this per database and you can make the same face as Taylor when thinking how you don't have to deal with all those nasty constrains that just get in the way. The beauty of Swift is that if you do have constraints in your DDL scripts, they will be just silently ignored. No more scripts that have to run in a specific order...it always runs fine..., your manager will love, no adore you..



There are many more things that are new, you can visit the SQL Server homepage to find out even more exciting stuff









Monday, March 28, 2016

The most interesting Build sessions for people working with data



Microsoft's Build conference is happening this week, the schedule for all the sessions was just made available. Below are all the sessions that will be most interesting to people who work with data. Clicking on the session's title will bring you to the Channel 9 page that will have the content. These session recording are of course not available yet, you will need to wait till after the conference is done to see them.



Intelligent Data Driven Applications that Learn and Adapt
Shawn Bice, Pablo Castro
Azure, Data, Intelligent App, Machine Learning
200 - Intermediate

Applications show intelligence when they can spot trends, react to unusual events, predict outcomes or recommend choices. Learn how to introduce intelligence traits into your apps including; establishing feedback loops, applying big data and machine learning techniques to classify, predict or otherwise analyze explicit and implicit signals, and operationalizing the full stack into the regular usage flow of the app. Most every day apps, from consumer to enterprise can deliver greater customer or business benefit by learning from user behavior and other signals. In this session we’ll take a pragmatic look at introducing real, useful data-driven intelligence into apps by walking through services, code and data needed to make it happen.


SQL Database Technologies for Developers
Lindsey Allen, Tony Petrossian
Azure, Data, Intelligent App
200 - Intermediate

Microsoft offers SQL Server and Azure SQL Database to help you develop great relational database applications. In this session you will learn about the top developer features coming in SQL Server 2016 which are already in Azure SQL Database. Additionally, you will see the latest investments in Azure SQL Database that enable you to easily manage thousands of databases and get the performance and security insight needed to build robust and secure applications in the cloud.



Adding Power BI Data Experiences to Your Applications
Josh Caplan, Lukasz Pawlowski
Analytics, Data
300 - Experienced

Bring stunning and interactive Power BI data experience to your applications. Learn how to integrate to Power BI using the Power BI REST APIs and how to build your own custom visualizations.


Delivering Applications at Scale with DocumentDB, Azure's NoSQL Document Database
John Macintyre, Dharma Shukla
Azure, Data, Intelligent App
200 - Intermediate

What does a #1 app on the iTunes App Store and an installment of one of the most successful gaming franchises in history have in common? They both use Azure DocumentDB to ingest massive volumes of data and serve low latency queries to provide great user experiences. Come learn about the business goals and technical challenges faced by two real-world, immensely popular applications and why they chose to use Azure DocumentDB, as well as the application patterns they used to achieve their massive scale requirements.



Azure IoT: Complete Cloud Offerings for the IoT Revolution
Sam George, Cameron Skinner
Azure, Data, Internet of Things
200 - Intermediate

IoT is the next revolution in computing and an incredibly exciting space. A critical part of IoT is cloud based solutions that enable you to connect, secure and manage IoT devices as well as providing deep insights from IoT data. Azure is all-in on IoT, this session will cover our industry leading offerings. Come be part of the next revolution!


Data Science for Developers
Danielle Dean, Daniel Grecoe
Analytics, Data
200 - Intermediate

In this session you will learn about Azure Machine Learning through a comprehensive end-to-end example that he builds during the session and that encompasses: •Problem detection •Algorithm selection •Machine learning model creation and deployment as a RESTful web service •Consumption of the machine learning model The session is intended for engineers, and Dan himself is an engineer, so he does not delve into a deep understanding of complex mathematical models behind machine learning, but instead focuses on the concepts of machine learning to demystify cloud machine learning.


Azure Data Lake and Azure Data Warehouse: Applying Modern Practices to Your App
Tim Mallalieu, Lara Rubbelke
Azure, Data
200 - Intermediate

Azure Data Warehouse and Azure Data Lake are two new services designed to work with all of your data no matter how big or complex. With two strong options to store, process and analyze large volumes of data, you may be curious about which service is right for your application needs. This session will drill into the strengths of each service and walk through the common application patterns for integration of large scale data analysis in the cloud using these services.


Advanced Analytics with R and SQL
Nagesh Pabbisetty, Tobias Ternstrom
Analytics, Data, Intelligent App
200 - Intermediate

R is the lingua franca of Analytics. SQL is the world’s most popular database language. What magic can you make happen by combining the power of R and SQL for Data Science and Advanced Analytics? Imagine the power of exploring, transforming, modeling, and scoring data at scale from the comfort of your favorite R environment. Now, imagine operationalizing the models you create directly in SQL Server, allowing your applications to use them from T-SQL, executed right where your data resides. Come learn how to build and deploy intelligent applications that combine the power of R, SQL Server, thousands of open source R extension packages, and high-performance implementations of the most popular machine learning algorithms at scale.


Insight from Exhaust, Enriching Your Applications with Data
Matthew Winkler
Azure, Analytics, Data, Analytics/Telemetry, .NET, C#, Machine Learning
200 - Intermediate

Modern cloud applications provide new opportunities every day for greater insights into our customers, code and quality using the signals in the application’s exhaust. Using the data produced by users and applications, developers can easily improve the experience for users and increase their engagement within the apps they are using. Starting with a typical application deployed in Azure, this talk will walk through the steps required to build a more intelligent application, customized for each user to provide a more engaging experience. We’ll add more engaging recommendations tailored to a user, more relevant in-application search results, and gain deeper understanding the application’s users. Additionally, we’ll cover how to process this telemetry as it is created to enable us to react in real time to changes in the application. No knowledge of machine learning, data science, or big data is required, by the end you’ll learn to use all three to create a richer application.

Tuesday, March 08, 2016

How to get SQL Server on Linux?


In order to get SQL Server on Linux you need to go to the following page: https://www.microsoft.com/en-us/server-cloud/sql-server-on-linux.aspx. Click on the part where is says Sign up to stay informed and to apply for the preview. It will bring you to a signup page that looks like this



After you filled out all the information required you will have to wait for a reply. Be aware that currently the SQL Server on Linux preview is limited to Ubuntu or as a Docker image

Monday, March 07, 2016

SQL Server 2016 Release Candidate 0 now available for download


The SQL Server engineering team just announced the availability of SQL Server 2016 Release Candidate 0.


To learn more about the release, visit the SQL Server 2016 preview page. To experience the new, exciting features in SQL Server 2016 and the new rapid release model, download the preview and start evaluating the impact these new innovations can have for your business.


For more information on this release, see SQL Server 2016 Release Notes and What’s New in SQL Server 2016.

Some of the more interesting things added is that instead of a trace flag, you can now execute a 
ALTER DATABASE SCOPED CONFIGURATION command.

It looks like this

ALTER DATABASE SCOPED CONFIGURATION
{
[FOR SECONDARY]
{ SET  |  CLEAR PROCEDURE_CACHE }
}[;]
< set_options > ::=
{ MAXDOP = { | PRIMARY}
| LEGACY_CARDINALITY_ESTIMATION = { ON | OFF | PRIMARY}
| PARAMETER_SNIFFING = { ON | OFF | PRIMARY}
| QUERY_OPTIMIZER_HOTFIXES = { ON | OFF | PRIMARY}
| ONLINE_OPERATION_DEFAULT = {ON | OFF} }


There is also finally a built in split string function.

It looks like this

SELECT * FROM STRING_SPLIT('Hi how are you doing?',' ')

The result is:

value
—–
Hi
how
are
you
doing?

Finally, there will be an event on March 10, 2016: SQL Server 2016 — Take the lead, watch the live stream

Get the RC 0  here: SQL Server 2016 Release Candidate 0 

Friday, March 04, 2016

Gotcha with the LEAD analytical function and date datatype

The other day some code that used the LEAD analytical function stopped working suddenly. It started to throw an error.

Msg 206, Level 16, State 2, Line 1
Operand type clash: date is incompatible with int

The code wasn't touched in a long time... so what could be the issue. Let's reproduce it here


First create the following table


CREATE TABLE testRunningTotal(Id tinyint,SomeDate datetime, Charge decimal(20,2))

INSERT testRunningTotal
SELECT 1,'20120101',1000
UNION ALL
SELECT 1,'20120401',200
UNION ALL
SELECT 1,'20120501',300
UNION ALL
SELECT 1,'20120601',600
UNION ALL
SELECT 2,'20120101',100
UNION ALL
SELECT 2,'20130101',500
UNION ALL
SELECT 2,'20140101',-800
UNION ALL
SELECT 3,'20120101',100

Running this query to see what is in this table


SELECT * FROM testRunningTotal

Id SomeDate Charge
1 2012-01-01 00:00:00.000 1000.00
1 2012-04-01 00:00:00.000 200.00
1 2012-05-01 00:00:00.000 300.00
1 2012-06-01 00:00:00.000 600.00
2 2012-01-01 00:00:00.000 100.00
2 2013-01-01 00:00:00.000 500.00
2 2014-01-01 00:00:00.000 -800.00
3 2012-01-01 00:00:00.000 100.00


Now run our fancy query...


SELECT id, someDate as StartDate,
LEAD(SomeDate -1,1,'99991231') 
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM testRunningTotal
  ORDER BY id, SomeDate


Here are our results

id StartDate Enddate Charge RunningTotal
1 2012-01-01 00:00:00.000 2012-03-31 00:00:00.000 1000.00 1000.00
1 2012-04-01 00:00:00.000 2012-04-30 00:00:00.000 200.00 1200.00
1 2012-05-01 00:00:00.000 2012-05-31 00:00:00.000 300.00 1500.00
1 2012-06-01 00:00:00.000 9999-12-31 00:00:00.000 600.00 2100.00
2 2012-01-01 00:00:00.000 2012-12-31 00:00:00.000 100.00 100.00
2 2013-01-01 00:00:00.000 2013-12-31 00:00:00.000 500.00 600.00
2 2014-01-01 00:00:00.000 9999-12-31 00:00:00.000 -800.00 -200.00
3 2012-01-01 00:00:00.000 9999-12-31 00:00:00.000 100.00 100.00


To see what the query actually does, visit this post Easy running totals with windowing functions

Let's take a closer look at the LEAD analytical function, this is what Books On Line has on its page

  LEAD ( scalar_expression [ ,offset ] , [ default ] ) 
    OVER ( [ partition_by_clause ] order_by_clause )

offset
The number of rows forward from the current row from which to obtain a value. If not specified, the default is 1. offset can be a column, subquery, or other expression that evaluates to a positive integer or can be implicitly converted to bigint. offset cannot be a negative value or an analytic function.

Looks all cool to me..nothing that stands out that could cause a problem...



What will happen if someone changes the column from a datetime to a date? Maybe they looked at the data and saw that all the values had only dates and no time components... why use 8 bytes of storage when you can use 3 instead?



To find out exactly what will happen, run the following DDL statement....


ALTER TABLE testRunningTotal ALTER COLUMN SomeDate date


Now go run your query again


SELECT id, someDate as StartDate,
LEAD(SomeDate -1,1,'99991231') 
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM testRunningTotal
  ORDER BY id, SomeDate

Here is the error you get

  Msg 206, Level 16, State 2, Line 1
Operand type clash: date is incompatible with int


The error is the same as if you would run the following


SELECT CONVERT(DATE,GETDATE() )-1

Instead what you have to do is use the DATEADD function to do arithmetic with dates


SELECT DATEADD(dd,-1,CONVERT(DATE,GETDATE()))


So to fix our query from before, we want to change the -1 to be DATEADD(dd,-1 instead, here is what our query looks like


 SELECT id, someDate as StartDate,
LEAD(dateadd(dd,-1,SomeDate),1,'99991231') 
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM testRunningTotal
  ORDER BY id, SomeDate

And here are our results, it's all good now

id StartDate Enddate Charge RunningTotal
1 2012-01-01 2012-03-31 1000.00 1000.00
1 2012-04-01 2012-04-30 200.00 1200.00
1 2012-05-01 2012-05-31 300.00 1500.00
1 2012-06-01 9999-12-31 600.00 2100.00
2 2012-01-01 2012-12-31 100.00 100.00
2 2013-01-01 2013-12-31 500.00 600.00
2 2014-01-01 9999-12-31 -800.00 -200.00
3 2012-01-01 9999-12-31 100.00 100.00


I still think that maybe the LEAD function should do this conversion internally since all we are doing is specifying the OFFSET...what is your opinion?


Thursday, March 03, 2016

Did you know I have a github repository for stuff I run over and over?



There are queries or commands that I run every now and then. I typically don’t have these stored on my computer; to find this code I usually find the blog post I made and copy paste it from there. This becomes a pain in the neck sometimes because I don’t quite remember the title of the post, so I might be spending a good amount of time looking for it. I decided to put some of the code that I run frequently up on github.

You will see code to find index fragmentation as well code to concatenate rows into columns. So there are two kinds of code here. Code that you can run as is and code that you will plug into a stored procedure after you make changes to the table names. You can find that github repository here https://github.com/SQLMenace/CommonTasks

Besides the Common Tasks repository, I also have a Today I Learned repository. A collection of concise write-ups on small things I learn day to day across a variety of languages and technologies. These are things that don't really warrant a full blog post.

You can find that project here https://github.com/SQLMenace/til

With the Today I Learned repository you will find that there are not many things yet. The reason is that I just started this repository, but since SQl Server 2016 is just around the corner, I will start playing around with SQL Server 2016 more and more, thus the Today I Learned repository will become bigger soon.

Wednesday, March 02, 2016

Easy running totals with windowing functions

Back in the pre SQL Server 2012 days, if you wanted to do a running count, you either had to run a subquery or you could use a variable. This was slow because for each row the query that did the sum would be executed. With the additions to the windowing functions in SQL Server 2012, this is now running much faster. 

Let's take a look, first create the following table


CREATE TABLE #test(Id tinyint,SomeDate date, Charge decimal(20,10))

insert #test
SELECT 1,'20120101',1000
UNION ALL
SELECT 1,'20120401',200
UNION ALL
SELECT 1,'20120501',300
UNION ALL
SELECT 1,'20120601',600
UNION ALL
SELECT 2,'20120101',100
UNION ALL
SELECT 2,'20130101',500
UNION ALL
SELECT 2,'20140101',-800
UNION ALL
SELECT 3,'20120101',100


let's check that data we just inserted into the temporary table


SELECT * FROM #test


The output looks like this

Id SomeDate Charge
1 2012-01-01 1000.0000000000
1 2012-04-01 200.0000000000
1 2012-05-01 300.0000000000
1 2012-06-01 600.0000000000
2 2012-01-01 100.0000000000
2 2013-01-01 500.0000000000
2 2014-01-01 -800.0000000000
3 2012-01-01 100.0000000000


What we want is the following

id StartDate Enddate         Charge         RunningTotal
1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000
1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000
1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000
1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000
2 2012-01-01 2012-12-31 100.0000000000 100.0000000000
2 2013-01-01 2013-12-31 500.0000000000 600.0000000000
2 2014-01-01 9999-12-31 -800.0000000000 -200.0000000000
3 2012-01-01 9999-12-31 100.0000000000 100.0000000000

For each row, we want to have the date that the row starts on and also the date when it end, we also want a running total as well. If there is no row after the current row for that id, we want the end date to be 9999-12-31.

So we will use a couple of functions. The first one is LEAD, LEAD accesses data from a subsequent row in the same result set without the use of a self-join. So the LEAD part looks like this

LEAD(dateadd(dd,-1,SomeDate),1,'99991231')  OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,

What we are doing is subtracting 1 from the date in the subsequent row (ateadd(dd,-1,SomeDate))
We are using 1 as the offset since we want to apply this to the next row. Finally if there is no subsequent row, we want to use the date 9999-12-31 instead of NULL

To do the running count, we will do the following

SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
          AS RunningTotal

What this means in English is for each id ordered by date, sum up the charge values for the rows between the preceding rows and the current row. Here is what all that stuff means.

ROWS BETWEEN
Specifies the rows that make up the range to use as implied by

UNBOUNDED PRECEDING
Specifies that the window starts at the first row of the partition. UNBOUNDED PRECEDING can only be specified as window starting point.

CURRENT ROW
Specifies that the window starts or ends at the current row when used with ROWS or the current value when used with RANGE.
CURRENT ROW can be specified as both a starting and ending point.

And here is the query


SELECT id, someDate as StartDate,
LEAD(dateadd(dd,-1,SomeDate),1,'99991231') 
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM #test
  ORDER BY id, SomeDate


And running that query, gives us the running count as well as the dates

id StartDate Enddate         Charge         RunningTotal
1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000
1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000
1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000
1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000
2 2012-01-01 2011-12-31 100.0000000000 100.0000000000
2 2012-01-01 2011-13-31 500.0000000000 600.0000000000
2 2012-01-01 9999-12-31 -800.0000000000 -200.0000000000
3 2012-01-01 9999-12-31 100.0000000000 100.0000000000
That's all for this post...