A blog about SQL Server, Books, Movies and life in general
Showing posts with label r. Show all posts
Showing posts with label r. Show all posts
Tuesday, November 01, 2016
Two free SQL Server books from Microsoft Press
I noticed that the Microsoft Virtual Academy has a bunch of SQL Server books available that you might enjoy. These books are free, the data science one come in pdf format, while the introducing SQL
Data Science with Microsoft SQL Server 2016
Buck Woody, Danielle Dean, Debraj GuhaThakurta, Gagan Bansal, Matt Conners, Wee-Hyong Tok
Introducing Microsoft SQL Server 2016: Mission-Critical Applications, Deeper Insights, Hyperscale Cloud
Stacia Varga, Denny Cherry, and Joseph D’Antoni
There is also a Power BI book available, if you interested in that instead
Here is what is in the 2 books I mentioned
Data Science with Microsoft SQL Server 2016
Chapter 1: Using this book 1
For the data science or R professional 1
Solution example: customer churn 3
Solution example: predictive maintenance
and the Internet of Things 4
Solution example: forecasting 5
For those new to R and data science 7
Step one: the math 8
Step two: SQL Server and Transact-SQL 11
Step three: the R programming language
and environment 14
Chapter 2: Microsoft SQL Server R
Services 16
The advantages of R on SQL Server 16
A brief overview of the SQL Server R
Services architecture 21
SQL Server R Services 21
Preparing to use SQL Server R Services 24
Installing and configuring 24
Server 25
Client 28
Making your solution operational 36
Using SQL Server R Services as a
compute context 36
Using stored procedures with R Code 40
Chapter 3: An end-to-end data science
process example 43
The data science process: an overview 44
The data science process in SQL Server R
Services: a walk-through for R and SQL
developers 47
Data and the modeling task 48
Preparing the infrastructure,
environment, and tools 51
Input data and SQLServerData object 65
Exploratory analysis 68
Data summarization 68
Data visualization 70
Creating a new feature (feature engineering)76
Using R functions 77
Using a SQL function 80
Creating and saving models 83
Using an R environment 84
Using T-SQL 86
Model consumption: scoring data with a
saved model 89
Evaluating model accuracy 95
Summary 97
Chapter 4: Building a customer churn
solution 98
Overview 99
Understanding the data 101
Building the customer churn model 105
Step-by-step 108
Summary 118
Chapter 5: Predictive maintenance and
the Internet of Things 120
What is the Internet of Things? 122
Predictive maintenance in the era of
the IoT 124
Example predictive maintenance use
cases 127
Before beginning a predictive
maintenance project 129
The data science process using SQL Server
R Services 132
Define objective 136
Identify data sources. 137
Explore data 140
Create analytics dataset 142
Create machine learning model 155
Evaluate, tune the model 157
Deploy the model 161
Summary 165
Chapter 6: Forecasting 167
Introduction to forecasting 169
Financial forecasting 169
Demand forecasting 170
Supply forecasting 171
Forecasting accuracy 171
Forecasting tools 173
Statistical models for forecasting 174
Time–series analysis 174
Time–series forecasting 179
Forecasting by using SQL Server R
Services 183
Upload data to SQL Server 183
Splitting data into training and testing 185
Training and scoring time–series
forecasting models 186
Generate accuracy metrics
Introducing Microsoft SQL Server 2016: Mission-Critical Applications, Deeper Insights, Hyperscale Cloud
This book does not have chapters 1 and 5, so don't think that you are going crazy
Chapter 2 Better security 1
Always Encrypted 1
Getting started with Always Encrypted 1
Creating a table with encrypted values 7
CREATE TABLE statement for encrypted columns 7
Migrating existing tables to Always Encrypted 9
Row-Level Security 11
Creating inline table functions 11
Creating security policies 14
Using block predicates 15
Dynamic data masking 15
Dynamic data masking of a new table 16
Dynamic data masking of an existing table 16
Understanding dynamic data masking and permissions 17
Masking encrypted values 18
Using dynamic data masking in SQL Database 18
Chapter 3 Higher availability 20
AlwaysOn Availability Groups 20
Supporting disaster recovery with basic availability groups 21
Using group Managed Service Accounts 23
Triggering failover at the database level 23
Supporting distributed transactions 24
Scaling out read workloads 25
Defining automatic failover targets 26
Reviewing the improved log transport performance 27
Windows Server 2016 Technical Preview high-availability enhancements 28
Creating workgroup clusters 28
Configuring a cloud witness 29
Using Storage Spaces Direct 32
Introducing site-aware failover clusters 32
Windows Server Failover Cluster logging 33
Performing rolling cluster operating system upgrades 33
Chapter 4 Improved database engine 35
TempDB enhancements 35
Configuring data files for TempDB 36
Eliminating specific trace flags 37
Query Store 38
Enabling Query Store 38
Understanding Query Store components 39
Reviewing information in the query store 40
Using Force Plan 42
Managing the query store 43
Tuning with the query store 44
Stretch Database 44
Understanding Stretch Database architecture 45
Security and Stretch Database 45
Identifying tables for Stretch Database 46
Configuring Stretch Database 47
Monitoring Stretch Database 48
Backup and recovery with Stretch Database 49
Chapter 6 More analytics 50
Tabular enhancements 50
Accessing more data sources with DirectQuery 51
Modeling with a DirectQuery source 51
Working with calculated tables 54
Bidirectional cross-filtering 56
Writing formulas 60
Introducing new DAX functions 60
Using variables in DAX 63
R integration 64
Installing and configuring R Services 64
Getting started with R Services 65
Using an R Model in SQL Server 74
Chapter 7 Better reporting 77
Report content types 77
Paginated report development enhancements 77
Introducing changes to paginated report authoring tools 78
Exploring new data visualizations 79
Managing parameter layout in paginated reports 84
Mobile report development 85
KPI development 85
Report access enhancements 86
Accessing reports with modern browsers 86
Viewing reports on mobile
You can get these books here: https://mva.microsoft.com/ebooks/
Sunday, February 21, 2016
How to store twitter search results from R into SQL Server
I was messing around with R and grabbing twitter searches from within R, after that I decided to store the result in a csv file. Instead of stroring the results in a file, what if I want to store that info into a SQL Server database? This is pretty easy to do. If you want to try this out yourself, first head on over to my other post Using twitter from the R console. There you will be able to see how to setup OAuth so that you can use twitter within R, you will also see how to install the R twitteR package as well.
First let's see how to connect to SQL Server from R
To load data from SQL Server into R, you first need to create an ODBC Data Source, I created one named LocalSqlServer, this DSN Data Source points to my local SQL Server 2016 instance.
After you have created your ODBC data source, you need to load the (RODBC) package, create a connection, execute a query, display the results and close the connection.. Here is what it looks like in the R Console
Now that we know how to connect to SQL Server, let's see if we can get the twitteR package to work, head on over to my other post Using twitter from the R console to see how to install twitterR and how some queries are executed.
Assuming the twitteR package is installed, we need to load it to use it
So to recap, here are the commands you need after you loaded the twitterR package and have setup OAuth
Time to get started....
To load data from SQL Server into R, you first need to create an ODBC Data Source, I created one named LocalSqlServer, this DSN Data Source points to my local SQL Server 2016 instance.
After you have created your ODBC data source, you need to load the (RODBC) package, create a connection, execute a query, display the results and close the connection.. Here is what it looks like in the R Console
> require(RODBC)
>db <- odbcConnect("LocalSqlServer")
> Sample <- sqlQuery(db, "select * from sysobjects", stringsAsFactors=FALSE)
> head(Sample)
> close(db) #close ORDBC connectionname id xtype uid info status base_schema_ver 1 sp_MSalreadyhavegeneration -1073624922 P 4 0 0 0 2 sp_MSwritemergeperfcounter -1072815163 P 4 0 0 0 3 TABLE_PRIVILEGES -1072372588 V 3 0 4 sp_replsetsyncstatus -1071944761 X 4 0 0 0 5 sp_replshowcmds -1070913306 P 4 0 0 0 6 sp_publishdb -1070573756 P 4 0 0 0
Now that we know how to connect to SQL Server, let's see if we can get the twitteR package to work, head on over to my other post Using twitter from the R console to see how to install twitterR and how some queries are executed.
Assuming the twitteR package is installed, we need to load it to use it
> library(twitteR)
The next command is the command to enable OAuth, to see how to set this up go here Using twitter from the R console
> setup_twitter_oauth("API key", "API secret", "Access token", "Access secret")
[1] "Using direct authentication"
Now it is time to do a simple search
Now it is time to do a simple search
> tweets <- searchTwitter('#rstats', n=6)
> tweets
Running that command should have given you some results. However what we need as output is a dataframe, this way we can easily load this into a table in SQL Server. You can use the twListToDF function
> tweets <- searchTwitter('#rstats', n=6)
> twListToDF(tweets)
The output now looks different
1 RT @rquintino: @Mairos_B #sqlsatportugal session: all about R in #SqlServer 2016 #rstats https://t.co/DHrqIZrz1e
2 a quick script to use imgcat in #rstats https://t.co/fpUlgWNX33 https://t.co/AhCCMLewCH
3 RT @KirkDBorne: Useful packages (libraries) for Data Analysis in R: https://t.co/haRKopFyly #DataScience #Rstats by @analyticsvidhya https:…
4 Hey #rstats tweeps, do you have any readings to recommend on sensitivity analysis? Books/articles/websites all welcome.
5 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
6 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
favorited favoriteCount replyToSN created truncated replyToSID
1 FALSE 0 NA 2016-02-20 20:29:54 FALSE NA
2 FALSE 0 NA 2016-02-20 20:24:50 FALSE NA
3 FALSE 0 NA 2016-02-20 20:16:25 FALSE NA
4 FALSE 0 NA 2016-02-20 20:11:08 FALSE NA
5 FALSE 0 NA 2016-02-20 20:11:06 FALSE NA
6 FALSE 0 NA 2016-02-20 20:02:05 FALSE NA
id replyToUID
1 701141750161784834 NA
2 701140474019577856 NA
3 701138356466483204 NA
4 701137026075140096 NA
5 701137018508722176 NA
6 701134750296227840 NA
statusSource
1 Mobile Web (M5)
2 Tweetbot for Mac
3 Twitter for Android
4 Twitter Web Client
5 Twitter for iPhone
6 Twitter Web Client
screenName retweetCount isRetweet retweeted longitude latitude
1 psousa75 3 TRUE FALSE NA NA
2 millerdl 0 FALSE FALSE NA NA
3 diana_nario 50 TRUE FALSE NA NA
4 emjonaitis 0 FALSE FALSE NA NA
5 caryden 41 TRUE FALSE NA NA
6 ArkangelScrap 41 TRUE FALSE NA NA
The output now has a lot more stuff, you can see if it has been retweeted or favorited as well as the latitude, longtitude and more.
> tweets
Running that command should have given you some results. However what we need as output is a dataframe, this way we can easily load this into a table in SQL Server. You can use the twListToDF function
> tweets <- searchTwitter('#rstats', n=6)
> twListToDF(tweets)
The output now looks different
1 RT @rquintino: @Mairos_B #sqlsatportugal session: all about R in #SqlServer 2016 #rstats https://t.co/DHrqIZrz1e
2 a quick script to use imgcat in #rstats https://t.co/fpUlgWNX33 https://t.co/AhCCMLewCH
3 RT @KirkDBorne: Useful packages (libraries) for Data Analysis in R: https://t.co/haRKopFyly #DataScience #Rstats by @analyticsvidhya https:…
4 Hey #rstats tweeps, do you have any readings to recommend on sensitivity analysis? Books/articles/websites all welcome.
5 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
6 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
favorited favoriteCount replyToSN created truncated replyToSID
1 FALSE 0 NA 2016-02-20 20:29:54 FALSE NA
2 FALSE 0 NA 2016-02-20 20:24:50 FALSE NA
3 FALSE 0 NA 2016-02-20 20:16:25 FALSE NA
4 FALSE 0 NA 2016-02-20 20:11:08 FALSE NA
5 FALSE 0 NA 2016-02-20 20:11:06 FALSE NA
6 FALSE 0 NA 2016-02-20 20:02:05 FALSE NA
id replyToUID
1 701141750161784834 NA
2 701140474019577856 NA
3 701138356466483204 NA
4 701137026075140096 NA
5 701137018508722176 NA
6 701134750296227840 NA
statusSource
1 Mobile Web (M5)
2 Tweetbot for Mac
3 Twitter for Android
4 Twitter Web Client
5 Twitter for iPhone
6 Twitter Web Client
screenName retweetCount isRetweet retweeted longitude latitude
1 psousa75 3 TRUE FALSE NA NA
2 millerdl 0 FALSE FALSE NA NA
3 diana_nario 50 TRUE FALSE NA NA
4 emjonaitis 0 FALSE FALSE NA NA
5 caryden 41 TRUE FALSE NA NA
6 ArkangelScrap 41 TRUE FALSE NA NA
The output now has a lot more stuff, you can see if it has been retweeted or favorited as well as the latitude, longtitude and more.
Time to send the result to SQL Server
There is a problem with how the sqlSave method creates the table, it makes the created column a float and then all inserts fail.
Run the following DDL statement in your database
Now we are ready to save our twitter data in SQL Server, here is how you do it.
There is a problem with how the sqlSave method creates the table, it makes the created column a float and then all inserts fail.
Run the following DDL statement in your database
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | CREATE TABLE [dbo].[tweets]( [rownames] [nvarchar](255) NULL, [text] [nvarchar](255) NULL, [favorited] [nvarchar](5) NULL, [favoriteCount] [float] NULL, [replyToSN] [nvarchar](5) NULL, [created] [datetime] NULL, [truncated] [nvarchar](5) NULL, [replyToSID] [nvarchar](5) NULL, [id] [nvarchar](255) NULL, [replyToUID] [nvarchar](5) NULL, [statusSource] [nvarchar](255) NULL, [screenName] [nvarchar](255) NULL, [retweetCount] [float] NULL, [isRetweet] [nvarchar](5) NULL, [retweeted] [nvarchar](5) NULL, [longitude] [nvarchar](255) NULL, [latitude] [nvarchar](255) NULL ) ON [PRIMARY] |
Now we are ready to save our twitter data in SQL Server, here is how you do it.
> sqlSave(db,append = TRUE,colnames = FALSE,verbose = FALSE, addPK = FALSE, twListToDF(tweets))
Here is what the results look in SSMS
Make sure that you have append = TRUE, if you make it FALSE, the table will be recreated every time but since the command issued creates the wrong data type for the created column, you can't use itSo to recap, here are the commands you need after you loaded the twitterR package and have setup OAuth
> tweets <- searchTwitter('#rstats', n=6) #store the twitter search results > twListToDF(tweets) #convert to dataframe > require(RODBC) # load RODBC package > db <- odbcConnect("LocalSqlServer") #connect > sqlSave(db,append = TRUE,colnames = FALSE,verbose = FALSE, addPK = FALSE, twListToDF(tweets)) #insert data into your table > close(db) #close ORDBC connection
Subscribe to:
Posts (Atom)