Monday, June 19, 2017

Five great SQL Server GitHub repos that every SQL Server person should check out



Did you know that there are some greats repos on github with awesome SQL scripts that you can use or even contribute to?

I decided to make you aware of some of these.

Tigertoolbox


The Microsoft SQL Server Tiger team has a GitHub repository where they host a bunch of SQL Server scripts

It is called the tigertoolbox. In the image below, you can see what is in this repository



GitHub Repo is here: https://github.com/Microsoft/tigertoolbox


Also make sure to follow the @mssqltiger twitter account to stay up to date with any news from the MS Tiger Team.


The SQL Server Maintenance Solution

This is a GitHub repo by Ola Hallengren
The SQL Server Maintenance Solution comprises scripts for running backups, integrity checks, and index and statistics maintenance on all editions of Microsoft SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, SQL Server 2012, SQL Server 2014, and SQL Server 2016. The solution is based on stored procedures, the sqlcmd utility, and SQL Server Agent jobs

GitHub Repo is here:  https://github.com/olahallengren/sql-server-maintenance-solution

Also make sure to visit the Ola's website for more info here: https://ola.hallengren.com/



dbatools

dbatools is a free PowerShell module with over 200 SQL Server administration, best practice and migration commands included.

GitHub Repo is here: https://github.com/sqlcollaborative/dbatools

Also make sure to visit their website: https://dbatools.io/




Microsoft SQL Server Samples Repository


This GitHub repository contains code samples that demonstrate how to use Microsoft's SQL products including SQL Server, Azure SQL Database, and Azure SQL Data Warehouse. Each sample includes a README file that explains how to run and use the sample.

Repo is here: https://github.com/Microsoft/sql-server-samples



SQL-Server-First-Responder-Kit

This repo by the nice folks from Brent Ozar Unlimited®.has a collection of procs to help you with health checks, maintenance and performance checks. Procs like sp_Blitz, sp_BlitzCache, sp_BlitzFirst, sp_BlitzIndex, and other SQL Server scripts.


GitHub Repo is here: https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit

Make sure to bookmark this page on the BrentOzar.com site to learn how to use these scripts/procs and to keep up to date on any additions/changes: https://www.brentozar.com/first-aid/

Wednesday, June 14, 2017

Running SQL Queries Against Baby Names from Social Security Card Applications



I decided to look around to see if I can find some interesting data sets
I came across the Baby Names from Social Security Card Applications-National Level Data data set and decided it would be a fun exercise to import this data into SQL Server and run some queries

You can download the zip file from the link above, you will see 137 files with a txt extension in the zip file

Here is what the description is for these files

National Data on the relative frequency of given names in the population of U.S. births where the individual has a Social Security Number 

(Tabulated based on Social Security records as of March 5, 2017)

For each year of birth YYYY after 1879, we created a comma-delimited file called yobYYYY.txt. Each record in the individual annual files has the format "name,sex,number," where name is 2 to 15 characters, sex is M (male) or F (female) and "number" is the number of occurrences of the name. Each file is sorted first on sex and then on number of occurrences in descending order. When there is a tie on the number of occurrences, names are listed in alphabetical order. This sorting makes it easy to determine a name's rank. The first record for each sex has rank 1, the second record for each sex has rank 2, and so forth.
To safeguard privacy, we restrict our list of names to those with at least 5 occurrences.



The format of the files looks like this

Mary,F,7065
Anna,F,2604
Emma,F,2003
Elizabeth,F,1939
Minnie,F,1746
Margaret,F,1578

So these files are simple comma separated values files

Download the zip file from this url: https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data

Unzip the files in a folder on your hard drive, in my case I unzipped it to C:\Names

Let's get started with the fun part

First I created a new DB


CREATE DATABASE Scratch
GO

USE Scratch
GO

Now that we have our database, let's create this very simple table, it will store all the data from the file, I also added a year column to this table



CREATE TABLE Names (FirstName varchar(500), Gender char(1), YearInFile date)
GO

Because the file doesn't have a year column, I created this view so that I can bulk insert into the view instead of having to mess around with SSIS or format files

CREATE VIEW myView
AS SELECT FirstName,Gender,NameCount
FROM Names

I extracted all the files into a folder with the name Names, I placed this on the C drive

Now that you have the table and view created, try running the following

BULK INSERT myView FROM 'c:\Names\yob1932.txt' WITH (FIELDTERMINATOR =',')

It should insert 9382 rows

If that is working, it is time to import all 137 files

I am presenting you a solution here so that you can run this all in SSMS, of course you could have as easy used PowerShell, Python or any other language to import these files. Nothing prevents you from doing that :-)

Attention/warning!!
We will be using xp_cmdshell

It is not a best practice to have xp_cmdshell enabled. As a matter of fact beginning with SQL Server 2005, the product ships with xp_cmdshell disabled. If you try to run xp_cmdshell you will get the following message if it is not enabled
Server: Msg 15281, Level 16, State 1, Procedure xp_cmdshell, Line 1

SQL Server blocked access to procedure ‘sys.xp_cmdshell’ of component ‘xp_cmdshell’ because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of ‘xp_cmdshell’ by using sp_configure. For more information about enabling ‘xp_cmdshell’, see “Surface Area Configuration” in SQL Server Books Online.

To enable xp_cmdshell execute the following code

EXECUTE SP_CONFIGURE 'show advanced options', 1
RECONFIGURE WITH OVERRIDE
GO
 
EXECUTE SP_CONFIGURE 'xp_cmdshell', '1'
RECONFIGURE WITH OVERRIDE
GO
 
EXECUTE SP_CONFIGURE 'show advanced options', 0
RECONFIGURE WITH OVERRIDE
GO

Now it is time to import all the files. First we want to store all the files that we want to import in a table

We are creating a temp table, we then call a DIR command with the B switch. we delete everything that doesn't end in txt. Finally we add an identity column, we will use this to loop over this table


IF OBJECT_ID('tempdb..#tempList') IS NOT NULL
DROP TABLE #tempList
 
CREATE TABLE #tempList ([FileName] VARCHAR(500))
 
--plain vanilla dos dir command with /B switch (bare format)
INSERT INTO #tempList
EXEC MASTER..XP_CMDSHELL 'dir c:\Names /B'
 
 
--delete the null values
DELETE #tempList WHERE [FileName] IS NULL
 
-- Delete all the files that don't have txt extension
DELETE #tempList WHERE [FileName] NOT LIKE '%.txt'
 
--this will be used to loop over the table
ALTER TABLE #tempList ADD id int IDENTITY
go

Let's see what we have in the table

SELECT * FROM #tempList

You will see something like this

FileName id
yob1880.txt 1
yob1881.txt 2
yob1882.txt 3
yob1883.txt 4
yob1884.txt 5
yob1885.txt 6
....................

Now it is time to import all the files, here is the code that will import all 137 files, I put comments in the code so you know what it does


DECLARE @LoopID int, @MaxID int
SELECT @LoopID = min(id),@MaxID = max(ID)
FROM #tempList
 

SET NOCOUNT ON

-- if you want to run this multiple time, wipe out the data
TRUNCATE TABLE Names

-- Change this to your directory
DECLARE @Directory varchar(50) = 'c:\Names\'

 

DECLARE @FileName varchar(500),@FullFileName varchar(500)
 
DECLARE @SQL NVARCHAR(1000)
Declare @year varchar(10)


-- loop over all the file
WHILE @LoopID <= @MaxID BEGIN SELECT @FileNAme = filename FROM #tempList WHERE id = @LoopID -- concatenate the directory and filename SELECT @FullFileName = @Directory + @FileName BEGIN SELECT @SQL = N'BULK INSERT myView FROM ''' + @FullFileName +''' WITH (FIELDTERMINATOR ='','')' -- you can remove this..this basically prints what will be executed PRINT @SQL
    -- run the command
    EXEC SP_EXECUTESQL @SQL
     
  -- this grabs the year part from the file name
 SELECT @year = right(parsename(@filename,2),4)

  -- This will update the rows we just inserted to have the year value
  UPDATE Names 
  SET YearInFile = @year
  WHERE YearInFile is null
    
    END
 
    --Get the next id, instead of +1 we grab the next value in case of skipped id values
    SELECT @LoopID = min(id)
    FROM #tempList
    WHERE id > @LoopID
END


And now that we have the data we can run some queries

Let's see if we have all the data

SELECT COUNT(*) FROM names
--1,891,894

That should return 1,891,894 rows

let's see..did the baby boomers parents really get it on after WW II?

SELECT  SUM(nameCount) as SumCount,YearInFile FROM names
group by YearInFile
ORDER BY SumCount DESC


SumCount YearInFile
-----------   
4200097 1957
4156576 1959
4154809 1960
4140160 1961
4131802 1958
4121130 1956
4035499 1962
4012933 1955
3993206 2007
3980407 1954

Looks like it... but what happened in 2007, top of the housing bubble was 2006/2007 so people started buying houses and needed to populate those rooms with humans  :-)

What are the top 10 most popular names over time?

SELECT TOP 10 FirstName, gender,SUM(Namecount) as Alltime
FROM names
GROUP BY FirstName,gender
ORDER BY 3 DESC


Here are the results, looks like males don't have as many unique names

FirstName gender Alltime
James M 5136026
John M 5105834
Robert M 4809176
Michael M 4338144
Mary F 4120692
William M 4087525
David M 3601032
Joseph M 2592856
Richard M 2560771
Charles M 2378806

What were the most popular names in 1880?

SELECT TOP 10  FirstName, gender,SUM(Namecount)  as [1880]
FROM names
where YearInFile = 1880
GROUP BY FirstName,gender
ORDER BY 3 DESC

Here are those results


FirstName gender 1880
John M 9655
William M 9532
Mary F 7065
James M 5927
Charles M 5348
George M 5126
Frank M 3242
Joseph M 2632
Anna F 2604
Thomas M 2534


For 2016

SELECT TOP 10 FirstName, gender,SUM(Namecount)  as [2016]
FROM names
where YearInFile = 2016
GROUP BY FirstName,gender
ORDER BY 3 DESC

Now we have some names which I recognize since my kids friends are named like hte ones on this list


FirstName gender 2016
Emma F 19414
Olivia F 19246
Noah M 19015
Liam M 18138
Ava F 16237
Sophia F 16070
William M 15668
Mason M 15192
James M 14776
Isabella F 14722


Anybody named after months?


I guess nobody likes February

SELECT FirstName , count(*)
FROM names
WHERE FirstName in('January','February','March','April','May','June',
   'July','August','September','October','November','December')
GROUP BY FirstName

FirstName Count
April 135
August 206
December 56
January 59
July 73
June 273
March 41
May 213
November 37
October 34
September 62

I wonder how many of these names are the result of someone putting the month in the first name field  :-)


How many names are both male and female?

;WITH cte as(
SELECT FirstName 
FROM names
GROUP BY FirstName
HAVING (COUNT( DISTINCT Gender )>1))

SELECT FirstName, gender, count(*) FROM Names n
WHERE EXISTS (SELECT * FROM cte c where c.FirstName = n.FirstName)
GROUP BY FirstName, Gender
ORDER BY 1

As you can see, there are a lot

Here are some where number of males and females doesn't differ that much

FirstName gender Count
Courtney F 123
Courtney M 126
Cruz F 121
Cruz M 114
Tom F 62
Tom M 137


One more query and that is it for this post..

How many people are named after weekdays?

SELECT FirstName , count(*)
FROM names
WHERE FirstName in('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday',
   'Sunday')
GROUP BY FirstName

Here are the results

FirstName Count
-------------       -----------------
Friday          4
Monday          5
Sunday        89
Tuesday        59
Wednesday     51


Last Monday I saw Tuesday who told me that she was going to see Friday on Sunday....

Yep, crystal clear...


That is it for this post.... feel free to run your own queries and if you find some interesting things, feel free to leave me a comment with the results or the query


One more query for you, I added this at the bottom since there are 52 rows in the result

What is the most popular letter a name starts with?

Here is that query, there are a couple of counts here:, the total count, the occurance and the distinct count of a name starting with the letter across all years



SELECT  LEFT(FirstName,1) AS Initial, gender,
SUM(Namecount) AS SumCount, 
COUNT(NameCount) AS TotalCount ,
COUNT(DISTINCT firstname) AS DistinctCount
FROM names
GROUP BY left(FirstName,1),gender
ORDER BY 3 DESC

Here are the results



Initial gender SumCount TotalCount DistinctCount
J M 29,396,710 63,543 3,995
M F 19,855,157 105,326 5,583
A F 16,744,520 117,270 7,501
R M 16,329,781 46,772 2,135
J F 14,887,120 79,019 4,622
D M 14,375,824 62,369 3,380
S F 13,587,753 90,382 6,059
M M 12,717,485 51,078 2,643
C M 12,652,140 51,820 2,255
C F 12,640,899 80,401 4,159
L F 12,099,095 86,872 4,473
A M 11,611,302 66,139 3,594
K F 11,065,934 74,797 4,967
D F 9,723,142 66,592 3,916
E F 9,644,797 49,929 2,477
B F 8,485,706 39,199 2,251
T M 8,331,252 42,525 2,592
B M 8,038,739 37,772 1,701
S M 7,586,177 42,524 2,524
R F 7,216,686 51,892 2,837
E M 7,122,968 36,781 1,542
W M 7,073,357 17,052 577
L M 6,594,743 42,492 1,799
G M 6,587,004 26,930 1,095
T F 5,913,655 61,915 4,277
K M 5,744,651 44,110 3,096
P F 5,024,379 19,759 1,090
N F 4,315,987 36,667 2,507
H F 4,203,255 19,717 1,159
P M 4,181,853 15,150 666
G F 4,083,933 28,566 1,398
H M 3,962,710 26,661 1,128
V F 3,546,832 23,367 1,083
N M 3,354,035 20,738 1,180
F M 3,048,688 15,174 592
F F 2,084,083 17,670 842
I F 1,966,942 17,719 1,066
I M 1,350,255 11,261 573
W F 1,235,983 9,462 569
V M 1,183,415 11,609 559
O M 1,112,184 15,521 724
O F 951,873 12,914 718
Z M 926,015 9,284 833
Y F 656,093 11,343 1,072
Z F 525,515 13,263 1,139
Q M 183,280 3,684 296
X M 172,430 1,175 127
Y M 163,545 7,031 579
U M 93,783 2,875 152
Q F 75,612 2,663 274
X F 54,170 1,466 150
U F 50,450 1,654 169



Not that many names that start with a letter at the end of the alphabet

Tuesday, June 06, 2017

Having fun with maxed out uniqifiers on a non unique clustered index



I was reading the post SQL Server Non-Unique Clustered Indexes Are Evil! by the Lone DB

In that post, the person created the following table


USE boo
GO

CREATE TABLE ClusteredIndexCrash ( i TINYINT NOT NULL ) ;
CREATE CLUSTERED INDEX ClusteredIndexCrashIdx ON dbo.ClusteredIndexCrash( i ) ;


As you can see, the table has a non unique clustered index

If you were to fill this table with the same value 2147483648 times, you will get an error

So if you run this first (this will take a while believe me)


INSERT INTO dbo.ClusteredIndexCrash
SELECT TOP 1000000
    1
FROM
    syscolumns c1 , syscolumns c2 , syscolumns c3 ;

GO 2146

Then you run the following statement


INSERT INTO dbo.ClusteredIndexCrash
SELECT TOP 1000000
    1
FROM
    syscolumns c1 , syscolumns c2 , syscolumns c3 ;


You would get the following error..straight from the beast himself apparently

Msg 666, Level 16, State 2, Line 1
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 72057594039173120. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.


I will be using DBCC PAGE and DBCC IND in this blog post, if you want to learn how to use these yourself, take a look at How to use DBCC PAGE


I decided to look in the pages of SQL Server itself. I grabbed all the pages by running DBCC IND


DBCC IND ('boo','ClusteredIndexCrash',  1);


That returned 4806619  rows, these are all the pages that exist for this table

Here is part of the output that matters, the highlighted row where NextPageFID is 0 is what we want, this is the last page.



Click on the image for a larger view


Then I grabbed the last page id from that result set and looked at the page by executing these commands


DBCC TRACEON (3604);
GO
DBCC PAGE (boo, 1, 4810967, 3) with TABLERESULTS;



Here is part of the output in text, I highlighted the uniqifier for you


PAGE HEADER: Slot 158 Offset 0xa40 Length 16 Record Type PRIMARY_RECORD

PAGE HEADER: Slot 158 Offset 0xa40 Length 16 Record Attributes NULL_BITMAP VARIABLE_COLUMNS

PAGE HEADER: Slot 158 Offset 0xa40 Length 16 Record Size 16

Slot 158 Offset 0xa40 Length 16 Memory Dump @0x00000000E832AA40 0000000000000000: 30000500 01020000 01001000 fdffff7f 0...........ýÿÿ.

Slot 158 Offset 0xa40 Length 16 Slot 158 Column 0 Offset 0xc Length 4 Length (physical) 4 UNIQUIFIER 2147483645

Slot 158 Offset 0xa40 Length 16 Slot 158 Column 1 Offset 0x4 Length 1 Length (physical) 1 i 1

Slot 158 Offset 0xa40 Length 16 Slot 158 Offset 0x0 Length 0 Length (physical) 0 KeyHashValue (5eeffbfe6491)

PAGE HEADER: Slot 159 Offset 0xa50 Length 16 Record Type PRIMARY_RECORD

PAGE HEADER: Slot 159 Offset 0xa50 Length 16 Record Attributes NULL_BITMAP VARIABLE_COLUMNS

PAGE HEADER: Slot 159 Offset 0xa50 Length 16 Record Size 16

Slot 159 Offset 0xa50 Length 16 Memory Dump @0x00000000E832AA50 0000000000000000: 30000500 01020000 01001000 feffff7f 0...........þÿÿ.

Slot 159 Offset 0xa50 Length 16 Slot 159 Column 0 Offset 0xc Length 4 Length (physical) 4 UNIQUIFIER 2147483646

Slot 159 Offset 0xa50 Length 16 Slot 159 Column 1 Offset 0x4 Length 1 Length (physical) 1 i 1

Slot 159 Offset 0xa50 Length 16 Slot 159 Offset 0x0 Length 0 Length (physical) 0 KeyHashValue (bedbd571a02d)




Here is the same output as an image from SSMS




Click on the image for a larger view



I decided to play around with this now that I had the table maxed out

What would happen if I deleted a bunch of rows? Would I still get the same error? I deleted 100 rows

DELETE TOP (100)
FROM  ClusteredIndexCrash


Now I tried to insert 1 row..

INSERT INTO dbo.ClusteredIndexCrash
SELECT 1

Msg 666, Level 16, State 2, Line 1
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 72057594039173120. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.

So same issue, doesn't reuse uniqifiers


Would an index rebuild fix this?


ALTER INDEX ClusteredIndexCrashIdx ON ClusteredIndexCrash REBUILD


-- 20 minutes later.......



INSERT INTO dbo.ClusteredIndexCrash
SELECT 1


Msg 666, Level 16, State 2, Line 1
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 72057594039566336. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.



Nope, nothing changed


What about a truncate?


TRUNCATE TABLE ClusteredIndexCrash


Now when I run DBCC IND...


DBCC IND ('boo','ClusteredIndexCrash',  1);

This returns nothing.. that is fine..after all this table is empty


Let's try adding a row


INSERT INTO dbo.ClusteredIndexCrash
SELECT 1


No problem, error is gone


I can now again use DBCC IND like before


DBCC IND ('boo','ClusteredIndexCrash', 1);

But since I know I only have 1 row now, I can also use the sys.fn_PhysLocCracker function together with the %%physloc%% virtual column, it will actually give me the page id the row is located on

Here is the query

SELECT top 10 *,%%physloc%% AS PhysicalLocationOfRow
FROM ClusteredIndexCrash
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) AS FPLC

This is the output from that query

i file_id page_id slot_id PhysicalLocationOfRow
1 1 7668633 0 0x9903750001000000


Running the DBCC PAGE command again with the pageid from the query above


DBCC TRACEON (3604);
GO
DBCC PAGE (boo, 1, 7668633, 3) with TABLERESULTS;



And here is the output from DBCC PAGE



PAGE HEADER: Slot 0 Offset 0x60 Length 9 Record Type PRIMARY_RECORD

PAGE HEADER: Slot 0 Offset 0x60 Length 9 Record Attributes NULL_BITMAP

PAGE HEADER: Slot 0 Offset 0x60 Length 9 Record Size 9

Slot 0 Offset 0x60 Length 9 Memory Dump @0x00000000C41EA060 0000000000000000: 10000500 01020000 01 .........

Slot 0 Offset 0x60 Length 9 Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0 UNIQUIFIER 0

Slot 0 Offset 0x60 Length 9 Slot 0 Column 1 Offset 0x4 Length 1 Length (physical) 1 i 1

Slot 0 Offset 0x60 Length 9 Slot 0 Offset 0x0 Length 0 Length (physical) 0 KeyHashValue (fa469958a30a)


Here it is also as an image




Click on the image for a larger view


As you can see the uniqifier is now 0

So it looks like the uniqifier behaves like an identity property per value, a truncate will reseed it back to 0 but when you delete values uniqifiers are not reused

So maybe we would need a DBCC CHECKUNIQIFIER command  :-)


So how important is this..will you ever run into this?  I can see this becoming an issue if you have a staging table which you don't drop or truncate but you constantly insert and delete data from it

Maybe you should drop the table once in a while?


Have you ever maxed out the uniqifier?

Saturday, May 13, 2017

All the Build 2017 data related sessions in 1 spot, SQL Server, CosmosDB, Azure SQL DB and more




Are you bored this weekend? No problem, I have got you covered, I have put together a page with all the data related sessions from the build 2017 conference. I have embedded the videos and also added links to the page at the channel 9 site. I have also added links to the presentations and source code if it was available.  I have arranged this in 4 categories:

SQL Server
Azure SQL DB
CosmosDB
The rest of the sessions


This way the sessions are not all intermingled

SQL Server


Securing your app data with Microsoft SQL Server 2017 
by Tommy Mullaney

Join us for an overview of strategy and best practices to build secure applications using SQL Server. SQL Server offers a versatile toolset of security capabilities, which enable you to protect sensitive data and meet stringent compliance requirements. You’ll leave with an understanding of how these capabilities fit together to protect your application as a whole, including the new security enhancements available in the latest version of SQL Server.




Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4018



Evolving SQL workloads from software to SaaS: Data for devs immersion
by Tara Shankar Jana

In this modern era; applications must recover from user errors efficiently, must be optimized for transactional performance, provide real-time insights into app transactions, must be secure and complaint, and support seamless integration to cloud, hybrid scenarios and other data types/systems. These applications must also learn and adapt to deliver more powerful experiences using cloud-based data services, applications, and intelligence. In this session we use a scenario-based hands-on lab program (immersion), to offer you a cohesive and consistent experience to build modern on-premises/cloud based applications with intelligence. Build hack-free, high performance, scalable applications using Microsoft SQL Server, Microsoft ML and Azure data services.



Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4013



Advanced SQL Server on Linux (Docker) via Entity Framework (EF)
by Tara Shankar Jana

It’s SQL Server as you already know it today, but it’s now native to Linux. It works with your data, your favorite tools, application frameworks (Entity Framework) and programming languages. In this session, we pull the code for an application running in cloud to a Linux-based machine. We test and enhance the application for performance (using EF, in-memory) and security against SQL vNext (Linux). Then we re-deploy the changes using DevOps to Azure with just a flip of d, show how the application works across platforms, on-premises, and in the cloud. Best part, all of it will be done in ten minutes.



Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4001


SQL Unplugged
by Scott Klein, Rohan Kumar

In this Channel 9 Live session we'll discuss all things SQL



Link to Build site: https://channel9.msdn.com/Events/Build/2017/C9L20


Serving AI with data: How to use R, Python, and machine learning with Microsoft SQL Server 2017
by Umachandar Jayachandran, Nellie Gustafsson

In this session, learn how to use R and Python to perform statistical and machine learning tasks in SQL Server 2017. Also, learn how to operationalize your R or Python scripts using the SQL Server integration.




download:Slides View Slides Online




Microsoft SQL Server 2017 in the modern datacenter: how to run on Linux, Docker, OpenShift, and Kubernetes
by Travis Wright, Tobias Ternstrom

With the on-premises data center evolving into a private cloud, the fast pace of innovation in public clouds, and the strong interest in container technology it begs the question: How do you run a database in this new environment? In this session we examine what it means that SQL Server is evolving to support both Windows and Linux, including running inside a container on both platforms. Should you run SQL Server inside a Docker container? How does it play with orchestration technologies like OpenShift and Kubernetes? The brief answer is that we believe SQL Server should be ubiquitous and run in our customers environment of choice. Come to the session to hear the long answer with a bunch of demos!



Link to Build site: https://channel9.msdn.com/Events/Build/2017/B8080


How to do Predictive Modelling using R and SQL Server ML Services
4 days ago  by Umachandar Jayachandran, Nellie Gustafsson

Learn how to use R scripts from T-SQL to perform training and scoring and leverage parallelism and streaming capabilities to get better performance.



Link to Build site: https://channel9.msdn.com/Events/Build/2017/T6070


Built-in machine learning in Microsoft SQL Server 2017 with Python
by Sumit Kumar

Machine learning services in SQL Server 2017 provides Python support for in-database machine learning, now. In this session we show the basics of how to run Python code in SQL Server. We then discuss how any app that can talk to SQL Server can get intelligence from machine learning models running in SQL Server. We showcase an app that uses a Python-based deep learning model built and deployed in SQL Server. The model leverages an open source deep learning framework running with SQL Server and utilizes GPU for boosting model training performance.




Link to Build site: https://channel9.msdn.com/Events/Build/2017/T6067



Modernize your database development lifecycle with SQL Server Data Tools in Visual Studio
2 days ago  by Tara Raj

Learn how SQL Server Data Tools (SSDT) turns Visual Studio into a powerful environment for database development. Easily build, debug, maintain, and refactor databases inside Visual Studio with a declarative model that spans all the phases of database development and easily enables continuous integration and deployment for your databases. Work offline with a database project, or work directly with a connected database instance in Azure SQL Database, Azure SQL Data Warehouse, and SQL Server running on Windows, Linux, or Docker, on-premises or in any cloud.


Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4009

Azure SQL DB

Get to the cloud faster with Azure SQLDB Managed Instance and Database Migration Service
by Lindsey Allen, Harini Gupta


A new, expanded Azure SQL Database offers fully-managed server instance with greater compatibility with SQL Server application features and more, and the ability to move hundreds of databases at once using the Database Migration Service. It provides security isolation with Azure Virtual Network, along with built-in HADR, built-in intelligent performance tuning, and intelligent security services that are already available in Azure SQL Database. To reduce the friction of transitioning your relational database estate to public cloud, expanded Azure SQL Database provides SQL Server application compatibility including commonly used cross database references using three-part names, CLR, SQL Agent, Transactional Replication, Change Data Capture, Service Broker, and a lot more. We showcase a five-step seamless migration experience from your on-premises SQL Server instances to expanded Azure SQL Database using the newly announced Azure Database Migration Service.



Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4008


Migrating Oracle database to the Azure SQL DB with Database Migration Service
by Shamik Ghosh, Alexander Ivanov

In this virtual session, you learn how to assess, remediate and migrate the schema, data, and server artifacts from on-premises Oracle instances to Azure SQL DB. We walk you through an end-to-end experience using Azure data migration tools and services to help solve your database migration needs.




Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4182 

download: Slides View Slides Online



How to build global-scale applications with Microsoft Azure SQL Database
by Rohan Kumar, Bob Strudwick

Join us in this session to learn how to build a global-scale IoT application by walking through a sample application and real-world customer case study. Gain insight on building an IoT application on a fully managed Azure database as a service, with built-in intelligence for performance, cost and security, dynamic scaling and hassle-free HADR, and infrastructure maintenance. We show you how to use Azure SQL Database enterprise grade in-memory engines to handle high volume and high concurrent transactions; while running real-time analytics queries on the same database. New features are released in Azure SQL Database in fast cadence; we show you the new Graph processing in the context of an IoT scenario, Adaptive Query Processing, predictive analytics, and the first in market built-in homomorphic data encryption feature for securing data at rest and in motion. It will be a fun learning hour.


Link to Build site: https://channel9.msdn.com/Events/Build/2017/B8018


Design Patterns for SaaS applications on Azure SQL Database
by Julie Strauss, Bill Gibson


Experience the power of building multi-tenant SaaS applications on Azure SQL Database, Microsoft’s fully managed database as a service platform: Using a sample SaaS application, we walk through a series of SaaS-focused design and management patterns that have been distilled from work with a multitude of customers. Patterns spans from multi-tenant provisioning, schema management, performance monitoring and management to operational analytics. The code for the sample application, plus management scripts, ARM templates and tutorials, will be available for download in an easy-to-explore “SaaS-in-a-Box” package, enabling you to jump-start your own SaaS application.


Link to Build site: https://channel9.msdn.com/Events/Build/2017/T6025

CosmosDB


CosmosDB
by Seth Juarez, Rimma Nehme

In this session we meet with Rimma Nehme and Seth Juarez to discuss some of the latest Build 2017 announcements around CosmosDB



Link to Build site: https://channel9.msdn.com/Events/Build/2017/C9L08


Azure Cosmos DB: Build planet scale mobile apps in minutes
by Kirill Gavrylyuk

This session has no description, but the video is below



Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4012


Azure Cosmos DB: API for MongoDB
by Andrew Hoh

Azure Cosmos DB (formerly known as Azure DocumentDB) natively supports multiple APIs; one of which is the API for MongoDB. Use existing code, applications, drivers, and tools to work with Azure Cosmos DB. Benefit from the fully managed and scalable Azure database, while continuing to use familiar skills and tools for MongoDB. Come and watch this video to learn about the feature and how to migrate to this cosmic-scale database, Azure Cosmos DB.




Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4011


Azure Cosmos DB: NoSQL capabilities everyone should know about
by Aravind Ramachandran

Microsoft Azure provides a fully managed NoSQL database service built for fast and predictable performance, high availability, elastic scaling, global distribution, and ease of development. As a schema-free NoSQL database, the service provides rich and familiar SQL query capabilities with consistent low latencies on JSON data - ensuring that 99% of your reads are served under 10 milliseconds and 99% of your writes are served under 15 milliseconds. These unique benefits make it a great fit for web, mobile, gaming, IoT, AI, and many other applications that need seamless scale and global replication. Come and learn about the NoSQL capabilities in Azure Cosmos DB that every developer should know about.



Link to Build site: https://channel9.msdn.com/Events/Build/2017/T6058



A lap around Azure HDInsight and Cosmos DB Open Source Analytics + NoSQL
by Andrew Liu, Raghav Mohan


Recently, we released the Spark Connector for our distributed NoSQL service – Azure Cosmos DB (formerly known as Azure DocumentDB). By connecting Apache Spark running on top Azure HDInsight to Azure Cosmos DB, you can accelerate your ability to solve fast-moving data science problems and machine learning. The Spark to Azure Cosmos DB connector efficiently exploits the native Cosmos DB managed indexes and enables updateable columns when performing analytics, push-down predicate filtering against fast-changing globally-distributed data, ranging from IoT, data science, and analytics scenarios. Come learn how you can perform blazing fast planet-scale data processing with Azure Cosmos DB and HDInsight.


Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4010


Data lakes, U-SQL, Azure DBMS  and the rest


How to get started with the new Azure DBMS for PostgreSQL.
by Eric Spear, Irakliy Khaburzaniya, Sunil Kamath

Join this session to learn about Microsoft’s latest announcement of managed PostgreSQL database on Azure. In this breakout, we will learn from two early adopting customers, how they've leveraged this latest database service to innovate faster. We will learn from their experience of using the managed PostgreSQL service, including migrating to the service, and discuss next steps in their application journey. We will walk through some of the key service features and discuss how you as a developer can migrate your existing applications or develop new applications that use this managed PostgreSQL in Azure. If you’re a developer with applications that use PostgreSQL today, whether on-premises or cloud, and want to learn about how this new managed service can help, this session is for you!



Link to Build site: https://channel9.msdn.com/Events/Build/2017/B8046
download: Slides View Slides Online



How to get started with the new Azure DBMS for MySQL.
by Gebi Liang, Jason M. Anderson, Matt Williams


Join this session to learn about Microsoft’s managed MySQL offering in Azure. We’ll walk through Microsoft’s strategy for supporting Open-Source database systems in Azure and what it means to you as a developer as you look to develop or deploy applications that use MySQL in Azure. An overview of the architecture of the service along with how Azure Database for MySQL is integrated with other Azure Services such as Web Apps will also be discussed and demo’d. If you’re a developer with applications using MySQL today, whether on-prem or already in Azure today, this session is for you!



Link to Build site: https://channel9.msdn.com/Events/Build/2017/B8045
download: Slides View Slides Online Source Code



How to serve AI with data: The future of the Microsoft Data Platform
by Joseph Sirosh, Alex Miller

The cloud is truly becoming the “brain” for our connected planet. You’re not just running algorithms in the cloud, rather you’re connecting that with data from sensors from around the world. By bringing data into the cloud, you can integrate all of the information, apply machine learning and AI on top of that, and deliver apps that are continuously learning and evolving in the cloud. Consequently, the devices and applications connected to cloud are learning and becoming increasingly intelligent. Please join us on a journey through the core new patterns that are emerging as we bring advanced intelligence, the cloud, IoT, and big data together.




Link to Build site: https://channel9.msdn.com/Events/Build/2017/B8081


How to run AI at Petabyte Scale with cognitive functions in the Azure Data Lake
by Wee Hyong Tok

In this session, learn how you can use Azure Data Lake (ADL) for doing Big Cognition on raw customer support data. Learn how you can use ADL to perform key phrase extraction, sentiment analysis, and how you can use R/Python scripts for support volume forecasting. In addition, learn how you can use federated queries in ADL with Azure SQL Database. Discover how you can pull all these insights into an Azure SQL Data Warehouse, and using Azure Analysis Services to enable interactive analysis of the processed data. Join us for this exciting session as we show how you can develop intelligent applications by using the insights derived from processing massive Petabyte-scale datasets.


Link to Build site: https://channel9.msdn.com/Events/Build/2017/B8065
download: Slides View Slides Online Source Code



Image processing at scale using U-SQL in Azure Data Lake
by Saveen Reddy

Making use of cognitive capabilities such as Image OCR or Sentiment Analysis of text is straightforward with small datasets of a few terabytes. But, at the scale of hundreds of terabytes or even a petabyte, you need a different approach to that can massively scale out *AND* be simple to build. Azure Data Lake offers a straightforward way of programming using .NET code against these massive Petabyte-scale datasets without the need to become a deep expert in distributed computing, big data technologies, or machine learning.Link to Build site:


Link to Build site: https://channel9.msdn.com/Events/Build/2017/T6040



Lift and shift any runtime into U-SQL for large scale processing
by Michael Rys

One of U-SQL’s major strengths is to scale out user code over your large data lake assets. This presentation first shows how we have used the U-SQL extensibility model to give you access to Python and R and then shows how to run Scala from within U-SQL.


Link to Build site: https://channel9.msdn.com/Events/Build/2017/P4004


There you have have a bunch of data related sessions for your viewing pleasure




Wednesday, May 10, 2017

Not sure that I like the message from Online Resumable Index Rebuild in SQL Server 2017


I was messing around with the Online Resumable Index Rebuild, this is new in CTP 2 of SQL Server 2017. I don't know that I like the output from an resumable index rebuild

Let's take a look at what I did
First I created this table

CREATE TABLE dbo.TestIndexRebuild(
 name nvarchar(35) NULL,
 number int NOT NULL,
 type nchar(3) NOT NULL,
 low int NULL,
 high int NULL,
 status int NULL,
 somebigchar char(2000)
) ON [PRIMARY]

CREATE CLUSTERED INDEX CI_TestIndexRebuild ON TestIndexRebuild(name,number)

I made the table wide by adding a 2000 character column, I then added a clustered index to the table

I then pumped in a bunch of data
INSERT INTO TestIndexRebuild
SELECT *,REPLICATE('A',2000)  
FROM master..spt_values
GO 500


I then executed the following
ALTER INDEX CI_TestIndexRebuild on TestIndexRebuild 
REBUILD WITH (ONLINE=ON, RESUMABLE=ON, MAX_DURATION=1) 

Here is the output I got after 1 minute

Msg 3643, Level 16, State 1, Line 19
The operation elapsed time exceeded the maximum time specified for this operation. The execution has been stopped.
Msg 596, Level 21, State 1, Line 18
Cannot continue the execution because the session is in the kill state.
Msg 0, Level 20, State 0, Line 18
A severe error occurred on the current command.  The results, if any, should be discarded.

I don't really like that, index rebuild has been paused because elapsed time has exceeded the maximum time displayed in black instead of red would have been fine with me. I don't need to see that the session is in a kill state or that a severe error occurred

I then executed the same command again

ALTER INDEX CI_TestIndexRebuild on TestIndexRebuild 
REBUILD WITH (ONLINE=ON, RESUMABLE=ON, MAX_DURATION=1) 

Here is the output

Warning: An existing resumable operation with the same options was identified for the same index on 'TestIndexRebuild'. The existing operation will be resumed instead.

While that was running in a second window, I executed the following

ALTER INDEX CI_TestIndexRebuild on TestIndexRebuild  ABORT 

All you get as output is

Command(s) completed successfully.

However, in the widow where you executed the resumable index rebuild you get this

Msg 1219, Level 16, State 1, Line 1
Your session has been disconnected because of a high priority DDL operation.
Msg 596, Level 21, State 1, Line 0
Cannot continue the execution because the session is in the kill state.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command.  The results, if any, should be discarded.

If you execute the ABORT again while the index is not being rebuilt, you get

Msg 10638, Level 16, State 2, Line 1
ALTER INDEX 'ABORT' failed. There is no pending resumable index operation for the index 'CI_TestIndexRebuild' on 'TestIndexRebuild'.

That makes sense


With a PAUSE Same thing happens as with ABORT, when you pause the index rebuild


ALTER INDEX CI_TestIndexRebuild on TestIndexRebuild  PAUSE

All you get as output is

Command(s) completed successfully

But you get those other messages in the original window


You can also execute a PAUSE followed by an ABORT, you will only get one set of messages, no error is displayed in the window where you executed the code below



ALTER INDEX CI_TestIndexRebuild on TestIndexRebuild  PAUSE 
GO
ALTER INDEX CI_TestIndexRebuild on TestIndexRebuild  ABORT 





I think the kill state and severe error occurred is a little over the top.
What is your opinion?


Now having said all that, I do like the resumable index rebuilds, it pretty much mimics the defragment/reorganize functionality. It continues from where it was when if was running last

There are some more things you can specify, for example, here is a sample command

ALTER INDEX test_idx on test_table RESUME WITH (MAXDOP=2, MAX_DURATION= 240 MINUTES, WAIT_AT_LOW_PRIORITY (MAX_DURATION=10, ABORT_AFTER_WAIT=BLOCKERS)) ;


Read more about this in section J of ALTER INDEX