I was messing around with R and grabbing twitter searches from within R, after that I decided to store the result in a csv file. Instead of stroring the results in a file, what if I want to store that info into a SQL Server database? This is pretty easy to do. If you want to try this out yourself, first head on over to my other post
Using twitter from the R console. There you will be able to see how to setup OAuth so that you can use twitter within R, you will also see how to install the R twitteR package as well.
Time to get started....
First let's see how to connect to SQL Server from R
To load data from SQL Server into R, you first need to create an ODBC Data Source, I created one named LocalSqlServer, this DSN Data Source points to my local SQL Server 2016 instance.
After you have created your ODBC data source, you need to load the (RODBC) package, create a connection, execute a query, display the results and close the connection.. Here is what it looks like in the R Console
> require(RODBC)
>db <- odbcConnect("LocalSqlServer")
> Sample <- sqlQuery(db, "select * from sysobjects", stringsAsFactors=FALSE)
> head(Sample)
> close(db) #close ORDBC connection
name id xtype uid info status base_schema_ver
1 sp_MSalreadyhavegeneration -1073624922 P 4 0 0 0
2 sp_MSwritemergeperfcounter -1072815163 P 4 0 0 0
3 TABLE_PRIVILEGES -1072372588 V 3 0
4 sp_replsetsyncstatus -1071944761 X 4 0 0 0
5 sp_replshowcmds -1070913306 P 4 0 0 0
6 sp_publishdb -1070573756 P 4 0 0 0
Now that we know how to connect to SQL Server, let's see if we can get the twitteR package to work, head on over to my other post
Using twitter from the R console to see how to install twitterR and how some queries are executed.
Assuming the twitteR package is installed, we need to load it to use it
> library(twitteR)
> setup_twitter_oauth("API key", "API secret", "Access token", "Access secret")
[1] "Using direct authentication"
Now it is time to do a simple search
> tweets <- searchTwitter('#rstats', n=6)
> tweets
Running that command should have given you some results. However what we need as output is a dataframe, this way we can easily load this into a table in SQL Server. You can use the twListToDF function
> tweets <- searchTwitter('#rstats', n=6)
> twListToDF(tweets)
The output now looks different
1 RT @rquintino: @Mairos_B #sqlsatportugal session: all about R in #SqlServer 2016 #rstats https://t.co/DHrqIZrz1e
2 a quick script to use imgcat in #rstats https://t.co/fpUlgWNX33 https://t.co/AhCCMLewCH
3 RT @KirkDBorne: Useful packages (libraries) for Data Analysis in R: https://t.co/haRKopFyly #DataScience #Rstats by @analyticsvidhya https:…
4 Hey #rstats tweeps, do you have any readings to recommend on sensitivity analysis? Books/articles/websites all welcome.
5 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
6 RT @KirkDBorne: A Complete Tutorial on Time Series Modeling in R: https://t.co/7oI6JKyU4E #MachineLearning #DataScience #Rstats by @Analyti…
favorited favoriteCount replyToSN created truncated replyToSID
1 FALSE 0 NA 2016-02-20 20:29:54 FALSE NA
2 FALSE 0 NA 2016-02-20 20:24:50 FALSE NA
3 FALSE 0 NA 2016-02-20 20:16:25 FALSE NA
4 FALSE 0 NA 2016-02-20 20:11:08 FALSE NA
5 FALSE 0 NA 2016-02-20 20:11:06 FALSE NA
6 FALSE 0 NA 2016-02-20 20:02:05 FALSE NA
id replyToUID
1 701141750161784834 NA
2 701140474019577856 NA
3 701138356466483204 NA
4 701137026075140096 NA
5 701137018508722176 NA
6 701134750296227840 NA
statusSource
1 Mobile Web (M5)
2 Tweetbot for Mac
3 Twitter for Android
4 Twitter Web Client
5 Twitter for iPhone
6 Twitter Web Client
screenName retweetCount isRetweet retweeted longitude latitude
1 psousa75 3 TRUE FALSE NA NA
2 millerdl 0 FALSE FALSE NA NA
3 diana_nario 50 TRUE FALSE NA NA
4 emjonaitis 0 FALSE FALSE NA NA
5 caryden 41 TRUE FALSE NA NA
6 ArkangelScrap 41 TRUE FALSE NA NA
The output now has a lot more stuff, you can see if it has been retweeted or favorited as well as the latitude, longtitude and more.
Time to send the result to SQL Server
There is a problem with how the sqlSave method creates the table, it makes the created column a float and then all inserts fail.
Run the following DDL statement in your database
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| CREATE TABLE [dbo].[tweets](
[rownames] [nvarchar](255) NULL,
[text] [nvarchar](255) NULL,
[favorited] [nvarchar](5) NULL,
[favoriteCount] [float] NULL,
[replyToSN] [nvarchar](5) NULL,
[created] [datetime] NULL,
[truncated] [nvarchar](5) NULL,
[replyToSID] [nvarchar](5) NULL,
[id] [nvarchar](255) NULL,
[replyToUID] [nvarchar](5) NULL,
[statusSource] [nvarchar](255) NULL,
[screenName] [nvarchar](255) NULL,
[retweetCount] [float] NULL,
[isRetweet] [nvarchar](5) NULL,
[retweeted] [nvarchar](5) NULL,
[longitude] [nvarchar](255) NULL,
[latitude] [nvarchar](255) NULL
) ON [PRIMARY]
|
Now we are ready to save our twitter data in SQL Server, here is how you do it.
> sqlSave(db,append = TRUE,colnames = FALSE,verbose = FALSE, addPK = FALSE, twListToDF(tweets))
Here is what the results look in SSMS
Make sure that you have append = TRUE, if you make it FALSE, the table will be recreated every time but since the command issued creates the wrong data type for the created column, you can't use it
So to recap, here are the commands you need after you loaded the twitterR package and have setup OAuth
> tweets <- searchTwitter('#rstats', n=6) #store the twitter search results
> twListToDF(tweets) #convert to dataframe
> require(RODBC) # load RODBC package
> db <- odbcConnect("LocalSqlServer") #connect
> sqlSave(db,append = TRUE,colnames = FALSE,verbose = FALSE, addPK = FALSE, twListToDF(tweets)) #insert data into your table
> close(db) #close ORDBC connection