R is a programming language primarily geared for statistical computing. Within psychology, it is fast becoming SPSS’s main competitor when it comes to conducting analysis. I have been using R for most of my work over the past 18 months, and I absolutely love it; it was not an exaggeration when I said in a previous post that R is my favourite thing EVER (OK, it’s top among work-related things, at least). I am a true R-convert, and I preach its existence to anyone who will listen. Halleluj-R!
This week on Twitter, someone asked others to list advantages of using R over SPSS in a teaching situation. Although I don’t use R for teaching (more on this below), it forced me to reflect on WHY I love R so much. So, I decided to list some of the core advantages I see R as having (in no particular order). In the spirit of fairness, I also reflected on some key disadvantages of using R.
I hope others find this of use before deciding whether to plunge into the R-world. I say dive right in.
1. It’s Free. R is an open-source venture, so EVERYTHING you need in R is free. Yup; FREE. To me, this is so important, because it allows the skills you develop whilst learning R to travel with you regardless of where your next job is. Imagine only knowing SPSS, but then moving to an institution that don’t have an SPSS licence—what do you do? Are you going to fork out for the individual licence fee yourself (which, by the way, will expire after a measly 12 months)?
2. Reproducible Analysis. It is very important that you be able to reproduce your analysis EXACTLY. It is embarrassing how many times in the past I have failed to be able to reproduce the same final response time averages after repeating a trimming procedure in Excel. How could I trust my data, or myself? After all, you can’t record mouse clicks in difference Excel menu options (unless you screen-capture your analysis session).
As R is a statistical programming language, you write scripts that will execute your analysis. So, you have a permanent record of your analysis steps, and will be able to reproduce your analysis exactly. More importantly, if you publish your script as supplementary material, ANYONE will be able to reproduce your analysis exactly. This is so important in today’s age of reproducible science.
3. Packages. R has hundreds of packages, which are add-ons to the core R system that allow you to do specific tasks more easily. They are a set of commands that have been programmed by some R user to execute certain functions more easily. For example, if you wish to use linear mixed effects models (which are becoming more popular in psychology), you can download the lme4 package, which allows you to conduct this analysis. Want to do structural equation modelling? Download the SEM package. There is a package for pretty much every statistical concept you can think of; importantly, all come fully-documented with examples. Again, these are all FREE. You don’t need to buy an AMOS licence as you would in SPSS. Why limit yourself to a set of pre-defined analytical tools? Get R.
4. Programmable Functions. Can’t find a package that does the job you want? No problem! As R is a primarily a programming language, you can just write a function yourself that will do the job. You can even contribute to R by publishing your own package containing your new functions if you like.
5. Sexy Plots. R has some absolutely stunning plotting capabilities (for example, by using the GGplot2 package—highly recommended!). These plots are of a publishable quality, and are in vector graphics, so they will not lose resolution when your publisher scales your plots up. Check out some of these example plots using GGplot2 for just the tip of the iceberg as to what R is capable of: http://docs.ggplot2.org/current/
6. Forces Deeper Engagement with Statistical Concepts. Because R isn’t a menu-driven point & click environment, you have to code your analysis using script. For me, this forces you to become more conversent with the techniques you are using, lest your misunderstanding leads you to code your analysis incorrectly. Even just doing plain data trimming in R makes you feel more intimate with your data, because you are coding how that data is to be manipulated. I forces you to think of EVERYTHING you are doing. SPSS can be executed with your eyes closed and your brain off.
7. Computational Simulations. I do a lot of computer simulations, and R is an absolute god-send for this. Again, this is due to R being primarily a programming language. I have conducted simulations of human cognition, distribution of p-values under a null hypothesis, even annual rainfall in the UK! All in one environment. R is so versatile. If it involves numbers, and it can be programmed, R will do the job.
8. Data Scraping. R also has several packages that allow you to scrape data from the internet for analysis. This is great for data geeks like me, who like to explore government/sport/financial data sets just for fun. Using R, you can get the data, arrange it for suitable analysis, conduct analysis, and plot analysis. All within the comfort of the R environment.
9. Great Community. Most open source ventures have great community spirit, but I find R has one of the best. Whenever you get stuck with how to do something (and you WILL get stuck), you can be almost certain to find a an answer by a quick Google search, because someone in the community will have written how to do what you need. If they haven’t, there are several resources to use to seek help (such as StackExchange).
1. Steep Learning Curve. R is quite difficult to learn. I remember when I first saw an R script I almost threw up in my mouth. It was so intimidating. R takes a while to get comfortable with. For months whilst learning R I was thinking how easily I could do a certain analysis in SPSS, or how easy a plot would be to create in Excel. But, now I’m getting to grips with it, I can honestly say it is all worth it and that you should push through the pain barrier. At the end of it all, you will be left with a superior environment for your analytical needs.
2. Not Ideal for Undergraduates. Related to the above, it’s perhaps not best suited as an introductory software package for newbie-statisticians in psychology. This is because students in psychology often struggle with the statistical concepts themselves, so it would seem cruel to force them to learn a daunting programming language at the same time. (I have no data on this, and would love to hear from others who HAVE used it successfully at undergraduate level.) Also, at institutions where many staff teach on one module, you would have to ensure all of the staff are fluent in R before deploying it at undergraduate level. Everyone in psychology knows how to use SPSS, but this isn’t true for R.
That being said, I think R is perfect for graduate level statistics. At this stage students should be comfortable with the basics of statistics so they can instead focus on learning to code.
3. It Can be Slow. R isn’t a low-level language like C++, so executing a large set of analysis can take some time. Of course, this only really applies to LARGE sets of analysis. For example, at the moment I am running a computer simulation of performance on the Flanker task. I am trying to find best-fitting parameters by a repeated search across many potential values. For each run of the model, I simulate 50,000 trials, arrange the synthetic data, compare it to human data, find the discrepancy, and repeat until the discrepancy is minimal. This is being repeated for EACH of 30 participants in my data set. The whole simulation is due to take > 3 weeks. But, this is quite an unusual situation. Standard analyses of the type you would do in SPSS are almost instantaneous. But, it’s something to bear in mind.
(DISCLAIMER: The slow speed of my simulation could be due to my inefficient coding rather than R!)