Showing posts with label r. Show all posts
Showing posts with label r. Show all posts

Tuesday, November 01, 2016

Two free SQL Server books from Microsoft Press






















I noticed that the Microsoft Virtual Academy has a bunch of SQL Server books available that you might enjoy. These books are free, the data science one come in pdf format, while the introducing SQL

Data Science with Microsoft SQL Server 2016
Buck Woody, Danielle Dean, Debraj GuhaThakurta, Gagan Bansal, Matt Conners, Wee-Hyong Tok


Introducing Microsoft SQL Server 2016: Mission-Critical Applications, Deeper Insights, Hyperscale Cloud
Stacia Varga, Denny Cherry, and Joseph D’Antoni





There is also a Power BI book available, if you interested in that instead

Here is what is in the 2 books I mentioned


Data Science with Microsoft SQL Server 2016

Chapter 1: Using this book 1
For the data science or R professional  1
Solution example: customer churn 3
Solution example: predictive maintenance
and the Internet of Things 4
Solution example: forecasting  5
For those new to R and data science  7
Step one: the math  8
Step two: SQL Server and Transact-SQL 11
Step three: the R programming language
and environment 14

Chapter 2: Microsoft SQL Server R
Services 16
The advantages of R on SQL Server  16
A brief overview of the SQL Server R
Services architecture  21
SQL Server R Services 21
Preparing to use SQL Server R Services 24
Installing and configuring  24
Server  25
Client 28
Making your solution operational  36
Using SQL Server R Services as a
compute context 36
Using stored procedures with R Code  40

Chapter 3: An end-to-end data science
process example 43
The data science process: an overview  44
The data science process in SQL Server R
Services: a walk-through for R and SQL
developers 47
Data and the modeling task 48
Preparing the infrastructure,
environment, and tools 51
Input data and SQLServerData object  65
Exploratory analysis 68
Data summarization 68
Data visualization 70
Creating a new feature (feature engineering)76
Using R functions 77
Using a SQL function  80
Creating and saving models  83
Using an R environment  84
Using T-SQL  86
Model consumption: scoring data with a
saved model 89
Evaluating model accuracy 95
Summary 97

Chapter 4: Building a customer churn
solution 98
Overview 99
Understanding the data  101
Building the customer churn model 105
Step-by-step 108
Summary 118


Chapter 5: Predictive maintenance and
the Internet of Things 120
What is the Internet of Things?  122
Predictive maintenance in the era of
the IoT  124
Example predictive maintenance use
cases 127
Before beginning a predictive
maintenance project 129
The data science process using SQL Server
R Services  132
Define objective 136
Identify data sources. 137
Explore data 140
Create analytics dataset 142
Create machine learning model  155
Evaluate, tune the model 157
Deploy the model  161
Summary 165

Chapter 6: Forecasting  167
Introduction to forecasting 169
Financial forecasting 169
Demand forecasting 170
Supply forecasting 171
Forecasting accuracy  171
Forecasting tools 173
Statistical models for forecasting 174
Time–series analysis 174
Time–series forecasting 179
Forecasting by using SQL Server R
Services 183
Upload data to SQL Server 183
Splitting data into training and testing 185
Training and scoring time–series
forecasting models 186
Generate accuracy metrics


Introducing Microsoft SQL Server 2016: Mission-Critical Applications, Deeper Insights, Hyperscale Cloud


This book does not have chapters 1 and 5, so don't think that you are going crazy


Chapter 2 Better security 1
Always Encrypted 1
Getting started with Always Encrypted 1
Creating a table with encrypted values 7
CREATE TABLE statement for encrypted columns 7
Migrating existing tables to Always Encrypted 9
Row-Level Security 11
Creating inline table functions 11
Creating security policies 14
Using block predicates 15
Dynamic data masking 15
Dynamic data masking of a new table 16
Dynamic data masking of an existing table 16
Understanding dynamic data masking and permissions 17
Masking encrypted values 18
Using dynamic data masking in SQL Database 18


Chapter 3 Higher availability 20
AlwaysOn Availability Groups 20
Supporting disaster recovery with basic availability groups 21
Using group Managed Service Accounts 23
Triggering failover at the database level 23
Supporting distributed transactions 24
Scaling out read workloads 25
Defining automatic failover targets 26
Reviewing the improved log transport performance 27
Windows Server 2016 Technical Preview high-availability enhancements 28
Creating workgroup clusters 28
Configuring a cloud witness 29
Using Storage Spaces Direct 32
Introducing site-aware failover clusters 32
Windows Server Failover Cluster logging 33
Performing rolling cluster operating system upgrades 33

Chapter 4 Improved database engine 35
TempDB enhancements   35
Configuring data files for TempDB 36
Eliminating specific trace flags 37
Query Store 38
Enabling Query Store 38
Understanding Query Store components 39
Reviewing information in the query store 40
Using Force Plan 42
Managing the query store 43
Tuning with the query store 44
Stretch Database 44
Understanding Stretch Database architecture 45
Security and Stretch Database 45
Identifying tables for Stretch Database 46
Configuring Stretch Database 47
Monitoring Stretch Database 48
Backup and recovery with Stretch Database 49

Chapter 6 More analytics 50
Tabular enhancements 50
Accessing more data sources with DirectQuery 51
Modeling with a DirectQuery source 51
Working with calculated tables 54
Bidirectional cross-filtering 56
Writing formulas 60
Introducing new DAX functions 60
Using variables in DAX 63
R integration 64
Installing and configuring R Services 64
Getting started with R Services 65
Using an R Model in SQL Server 74

Chapter 7 Better reporting 77
Report content types  77
Paginated report development enhancements 77
Introducing changes to paginated report authoring tools 78
Exploring new data visualizations 79
Managing parameter layout in paginated reports 84
Mobile report development  85
KPI development 85
Report access enhancements 86
Accessing reports with modern browsers 86
Viewing reports on mobile

You can get these books here: https://mva.microsoft.com/ebooks/

Sunday, February 21, 2016

How to store twitter search results from R into SQL Server

I was messing around with R and grabbing twitter searches from within R, after that I decided to store the result in a csv file. Instead of stroring the results in a file, what if I want to store that info into a SQL Server database? This is pretty easy to do. If you want to try this out yourself, first head on over to my other post Using twitter from the R console. There you will be able to see how to setup OAuth so that you can use twitter within R, you will also see how to install the R twitteR package as well.

Time to get started....

First let's see how to connect to SQL Server from R
To load data from SQL Server into R, you first need to create an ODBC Data Source, I created one named LocalSqlServer, this DSN Data Source points to my local SQL Server 2016 instance.


After you have created your ODBC data source, you need to load the (RODBC) package, create a connection, execute a query, display the results and close the connection.. Here is what it looks like in the R Console



> require(RODBC)

>db <- odbcConnect("LocalSqlServer")

> Sample <- sqlQuery(db, "select * from sysobjects", stringsAsFactors=FALSE)

> head(Sample)
> close(db)  #close ORDBC connection
name id xtype uid info status base_schema_ver 1 sp_MSalreadyhavegeneration -1073624922 P 4 0 0 0 2 sp_MSwritemergeperfcounter -1072815163 P 4 0 0 0 3 TABLE_PRIVILEGES -1072372588 V 3 0 4 sp_replsetsyncstatus -1071944761 X 4 0 0 0 5 sp_replshowcmds -1070913306 P 4 0 0 0 6 sp_publishdb -1070573756 P 4 0 0 0

Now that we know how to connect to SQL Server, let's see if we can get the twitteR package to work, head on over to my other post Using twitter from the R console to see how to install twitterR and how some queries are executed.


Assuming the twitteR package is installed, we need to load it to use it


library(twitteR)

The next command is the command to enable OAuth, to see how to set this up go here Using twitter from the R console

setup_twitter_oauth("API key", "API secret", "Access token", "Access secret")
[1] "Using direct authentication"

Now it is time to do a simple search

tweets <- searchTwitter('#rstats', n=6) 
tweets





Running that command should have given you some results. However what we need as output is a dataframe, this way we can easily load this into a table in SQL Server. You can use the twListToDF function


tweets <- searchTwitter('#rstats', n=6) 
twListToDF(tweets)


The output now looks different



1                             RT @rquintino: @Mairos_B #sqlsatportugal session: all about R in #SqlServer 2016 #rstats https://t.co/DHrqIZrz1e
2                                                      a quick script to use imgcat in #rstats https://t.co/fpUlgWNX33 https://t.co/AhCCMLewCH
3 RT @KirkDBorne: Useful packages (libraries) for Data Analysis in R: https://t.co/haRKopFyly #DataScience #Rstats by @analyticsvidhya https:…
4                      Hey #rstats tweeps, do you have any readings to recommend on sensitivity analysis? Books/articles/websites all welcome.
5 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
6 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
  favorited favoriteCount replyToSN             created truncated replyToSID
1     FALSE             0        NA 2016-02-20 20:29:54     FALSE         NA
2     FALSE             0        NA 2016-02-20 20:24:50     FALSE         NA
3     FALSE             0        NA 2016-02-20 20:16:25     FALSE         NA
4     FALSE             0        NA 2016-02-20 20:11:08     FALSE         NA
5     FALSE             0        NA 2016-02-20 20:11:06     FALSE         NA
6     FALSE             0        NA 2016-02-20 20:02:05     FALSE         NA
                  id replyToUID
1 701141750161784834         NA
2 701140474019577856         NA
3 701138356466483204         NA
4 701137026075140096         NA
5 701137018508722176         NA
6 701134750296227840         NA
                                                                            statusSource
1                Mobile Web (M5)
Tweetbot for Mac
3   Twitter for Android
4                     Twitter Web Client
5     Twitter for iPhone
6                     Twitter Web Client
     screenName retweetCount isRetweet retweeted longitude latitude
1      psousa75            3      TRUE     FALSE        NA       NA
2      millerdl            0     FALSE     FALSE        NA       NA
3   diana_nario           50      TRUE     FALSE        NA       NA
4    emjonaitis            0     FALSE     FALSE        NA       NA
5       caryden           41      TRUE     FALSE        NA       NA
6 ArkangelScrap           41      TRUE     FALSE        NA       NA



The output now has a lot more stuff, you can see if it has been retweeted or favorited as well as the latitude, longtitude and more.

Time to send the result to SQL Server

There is a problem with how the sqlSave method creates the table, it makes the created column a float and then all inserts fail.


Run the following DDL statement in your database
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
CREATE TABLE [dbo].[tweets](
	[rownames] [nvarchar](255) NULL,
	[text] [nvarchar](255) NULL,
	[favorited] [nvarchar](5) NULL,
	[favoriteCount] [float] NULL,
	[replyToSN] [nvarchar](5) NULL,
	[created] [datetime] NULL,
	[truncated] [nvarchar](5) NULL,
	[replyToSID] [nvarchar](5) NULL,
	[id] [nvarchar](255) NULL,
	[replyToUID] [nvarchar](5) NULL,
	[statusSource] [nvarchar](255) NULL,
	[screenName] [nvarchar](255) NULL,
	[retweetCount] [float] NULL,
	[isRetweet] [nvarchar](5) NULL,
	[retweeted] [nvarchar](5) NULL,
	[longitude] [nvarchar](255) NULL,
	[latitude] [nvarchar](255) NULL
) ON [PRIMARY]

Now we are ready to save our twitter data in SQL Server,  here is how you do it.


> sqlSave(db,append = TRUE,colnames = FALSE,verbose = FALSE, addPK = FALSE, twListToDF(tweets))
Here is what the results look in SSMS



Make sure that you have append = TRUE, if you make it FALSE, the table will be recreated every time but since the command issued creates the wrong data type for the created column, you can't use it


So to recap, here are the commands you need after you loaded the twitterR package and have setup OAuth


> tweets <- searchTwitter('#rstats', n=6) #store the twitter search results
> twListToDF(tweets)  #convert to dataframe

> require(RODBC)  # load RODBC package
> db <- odbcConnect("LocalSqlServer")  #connect
> sqlSave(db,append = TRUE,colnames = FALSE,verbose = FALSE, addPK = FALSE, twListToDF(tweets))  #insert data into your table
> close(db)  #close ORDBC connection