As interest in data science has surged, the R language jumped from a statistical niche to the programming mainstream.
“R, relative to other languages that I’ve come across, is accessible, readable and versatile,” says Jamie Hall, who uses R in his work as a consultant at Synapse Energy Economics.
So what can you do with R?
I’d been hearing about it for years before I decided to learn R in the early 2010s.
I saw a presentation by a journalist from The New York Times in which she created hundreds of graphics with a few lines of code. It looked like magic.
Better yet, it wasn’t just for graphics. It did all the statistical tasks that typically required expensive software like SAS or SPSS.
R had been around for a while by then, and there was material online about what you can do with R. So I figured I’d just ask professor Google and dive right in.
It all seemed very foreign: Data tables were “frames,” simple calculations seemed to require complex mathematical notation I hadn’t seen since college, and you used a weird “<-” arrow everywhere I thought there should be an “=”.
Worse yet, even the “easy” books seemed more than a bit intimidating. Even “R in a Nutshell” clocked in at more than 600 pages.
But I eventually discovered what many R users already knew.
“Learn by doing,” says David Smith, a cloud advocate at Microsoft who works with R.
Unfortunately for me, I didn’t think to tackle small, manageable tasks. I kept looking for something “big” to do with R.
So I picked it up then put it down for more than a year.
The project that made me feel like a “real” R user came along when a co-worker asked me to help him analyze changes in test scores at every school district and school in Texas.
We needed to do the analysis, and because not everyone working on the project “got” statistics, we needed to create graphics for each test, each school, each grade and each school district.
It needed to be ready in two weeks.
The only way I knew of to create that many graphics that quickly was with the R ggplot package.
I had my big project. Thanks to R, I crushed it.
So depending on your viewpoint, it either took me more than a year to learn R or about 10 days.
But while 10 working days is enough to master the basics of R, I definitely should not have started with a major work project on a tight deadline.
“The most important thing, in my opinion, is putting the language to work by working on a personal project,” says Hall.
He suggests “something very simple” like calculating your average weekly personal expenses.
“I’m still surprised that there’s fairly widespread impression that R is not good at general-purpose computing tasks,” says Bob Rudis, co-author of “Data-Driven Security” and an avid R user.
“I have R running on things like the Raspberry Pi, where it’s driving an e-paper display.”
In fact, there are more than 16,000 R packages for a wide variety of tasks.
Those packages tend to have substantially better documentation than most open-source software. In addition to the documentation, many packages have entire academic papers devoted to explaining them.
“I always get excited when I stumble upon a function or a package that another regular user like myself has built and that seamlessly solves a problem that I was having,” says Hall.
Still, R’s original community emerged from university statistics departments, so it excels at data wrangling and has packages designed for applying an enormous number of statistical and data methods.
Of particular value are tools for web scraping, dealing with social media data, text analysis, machine learning and complex survey statistics.
Once your analysis is done, you can put an interactive version of your data online using Shiny.
To be successful, keep tackling projects.
“Finding a new problem to solve daily, even if it’s a small one, will really build up the R ‘muscle memory,’” Rudis says.
Those projects don’t need to be complex.
“When I first moved to the Napa Valley, I had a goal of visiting every winery by bike. So I found a data source on wineries online, and made myself a map using R,” says Microsoft’s Smith.
There are a lot of great resources for learning R. Here are some of the best by category.
The following are self-contained online courses for people to learn R. These are especially good when you don’t have a specific problem to solve.
- Swirl: Teaches you programming by using R in self-paced, interactive exercises.
- RYouWithMe page: Free information to start learning R created by members of the R-Ladies Sydney user group.
- Data Literacy and Data Visualization: Not specifically designed to teach R, the YouTube recordings of professor Bear Braumoeller’s Ohio State University class are perfect for those who need an easily followed introduction to statistical methods.
- Adventures in R: Free, open-source, eight-week college-level course on R for data science and statistics with the option to pay for additional time and project review with the professor, Dr. Kelly Bodwin.
Tutorials and Guides
When you want to figure out how to do a specific task in R – and I recommend learning this way – these sites have step-by-step instructions to get you there.
- R-exercises: This site has more than 400 exercises for learning to do things in R. That includes tutorials related to investing, lots of information on using “big data” and even a specific series on learning R for doctors.
- ComputerWorld: R Language: Thanks to the passion of the IDG executive editor for data and analytics, the ComputerWorld website has a steady stream of handy articles and videos.
- R-Bloggers: It’s not pretty, but this site aggregates blog posts about R from across the web. So it’s a great place to search for tutorials on anything from A to Zillow.
- CRAN Task Views: CRAN, the main archive of R packages, has a list of topical guides.
These days, there are some great books about using R for various purposes. The following are good for those starting out.
- “The Book of R”: This beginner-friendly book is a bit like taking a course on R. As an added benefit, a sample chapter and the supplemental materials are available for free so you can try before you buy.
- “R Cookbook“: Once you have a problem to solve, this recently updated book gives the step-by-step instructions to take on hundreds of different tasks.
- “R for Data Science“: Definitely a more advanced book. If you use SPSS or SAS, this is for you. Available in print and online.
You’ll want to download these tools for programming in R.
- RStudio: RStudio itself is the premier code editor for writing and running R scripts. The company’s site also offers a wealth of tutorials and cheat sheets.
- R Commander: Sociologist John Fox created this point-and-click tool for doing common statistical actions in R. It’s great because after you use the graphical interface, you can look at the code it generated. He even wrote a book about it.
- R Programming Compiler: The best computer is the one you have with you. This app lets you write and run R scripts on your iPhone or iPad. It also has a version for Android devices. It’s free to install and run basic scripts, but you need a subscription for it to be really useful.
For those in fantasy leagues, applying sports statistics is a great way to learn R and possibly gain an edge.
- “Analyzing Baseball Data with R“: Baseball junkies, this book is for you. This is really for people who already know the basics of sabermetrics (baseball statistics). There’s also a similar book on analyzing basketball.
- Fantasy Football Analytics: This site is not for the faint of heart, but it’s worth a look anyway. The series of blog posts on making football projections provides great examples of real-world R usage. They’ve even published all of their tools as an easy-to-run R package called ffanalytics that lets you do a lot of fantasy football analysis in just one or two lines of code.
- r/sportsanalytics: A lot of interesting content related to sports data gets posted and discussed on the Sports Analytics sub-Reddit. Try searching for “R” to find compelling stuff.