R you connecting to SQL Server?

May 3, 2011
As I've mentioned some of my pet projects require access to statistics and data mining algorithms.  I have played with a few different approaches but none of them really had everything I was looking for in terms of methods.  I eventually decided to give R a try.  For those who are not familiar R is a free software environment for statistical computing and graphics that is available for a wide variety of UNIX platforms, Windows and MacOS.  I had used R in the past but only on a very limited basis and never against a relational data source so I was quite suprised with what I found. 
R has a massive following with thousands of contributors to an online library of algorithms.  Need access to a bayesian algorithm for classification, no problem.  Want to draw familial diagram with genetic markers, piece of cake.  It's all in there and there are quite a few nice resources online to help you learn the platform.  One of the best I have found is http://www.statmethods.net/ but there are plenty of others.  Even if what you are looking for doesn't exist in R (unlikely) you can engage with the community to get where you need to go.
Once I determined that R has all the methods I was looking for the next step was accessing my data from within R.  Sure, you can load flat files directly into R but my data is all stored in a relational database.  As it turns out, R can handle that too.  Let's get started.
R has a terrific package management system that allows you to select a package repository called a CRAN mirror (there are many) and install additional packages.
To connect to a relational database you'll want to grab the RODBC package by selecting 'Install package(s)...' and finding it in the list after selecting a nearby CRAN mirror. Once it has been installed you can load into your current sessions as follows:
Now for the fun part.  In the code below I connect to the database and establish a channel that can then be used for running your queries.  Once I have a channel I get a listing of tables in the Sales schema of the AdventureWorks sample database.  Finally I run a query to return some data.
channel <- odbcDriverConnect(connection = "Driver={SQL Server Native Client 10.0};Server=localhost;Database=AdventureWorks;Trusted_Connection=yes;")
sqlTables(channel, schema="Sales")
sqlQuery(channel,"select top 10 * from Sales.SalesOrderHeader")
Of course you might not be using SQL Server 2008 or you might be using a different security model. Regardless just take a trip over to www.connectionstrings.com and find the ODBC connection string that fits your scenario and plug it into the odbcDriverConnect function.  Now that you can connect R to your relational environment you can take advantage of the statistical and data mining algorithms within, free of charge.
blog comments powered by Disqus

About the author

Frank DeFalco has worked as software developer and informatics professional since 1996.  He is currently employed by Johnson & Johnson where he manages the Health Informatics Center of Excellence.


Reach me via email at frank (at) thedefalcos (dot) com.


Follow me on twitter @fjdefalco