Bayesian Estimation of Partial Correlations

Correlations are a popular analysis tool in psychology to examine the extent to which two variables are related. For example, one might be interested in whether there is a relationship between shoe size and height. But what if you want to explore the relationship between two variables whilst controlling for the effects of a third variable? For example, you might be interested in the relationship between shoe size and height whilst controlling for age. In order to do this, one needs to use partial correlations.

Whilst there are relatively straightforward methods to calculate partial correlations using frequentist statistics, I was interested recently whether there is a Bayesian parameter estimation version of this analysis. A cursory glance of the internet didn’t bring up much1, so I set about developing my own.

Below, I briefly cover the frequentist implementation before providing an overview of the Bayesian parameter estimation version.

Frequentist Implementation

Assume we have three variables, X, Y, and Z. We are interested in the relationship between X and Y, but need to account for the effect of Z. I simulated 75 data points (these could be participants in a study), and calculation of a standard Pearson’s correlation between X and Y provides rx,y(74) = 0.48, p<0.05. A “significant” relationship! However, this doesn’t take into account the effect of Z.

Examination of all possible correlations between X, Y, and Z reveals a strong relationship between Z and the other two variables:

  • rx,z(74) = 0.58, p<0.05.
  • ry,z(74) = 0.62, p<0.05.

Examining the relationship between X and Y whilst controlling for Z (denoted here as *rx,y|z*) is given by the following formula:

r_{x,y|z} = \frac{r_{x,y} - (r_{x,z} \times r_{y,z})}{\sqrt{1 - r_{x,z}^{2}} \times \sqrt{1 - r_{y,z}^{2}}}

When calculating this partial correlation between X and Y, controlling for Z, we get rx,y|z(74) = 0.19, p = .10. Note now that the relationship between X and Y is no longer “significant”. The Figure below shows the relationship with Regular (i.e., Pearson’s) and Partial correlation.

Bayesian Parameter Estimation

This is all well and good, but what if you want a Bayesian estimate of the partial correlation parameter? I had a quick check of the internet, and couldn’t see anything. (I should say that I didn’t spend long looking; I like to do some things from scratch just so I can learn something, and this was one of those times.) Inspired by the SUPERB book by Lee & Wagenmakers on Bayesian modelling, I devised a Bayesian graphical model of partial correlation:


This model extended the one presented in the Lee & Wagenmakers book which was used to calculate the correlation coefficient between just two variables. There are two extensions in the current model over that of Lee & Wagenmakers:

  • The model is extended to accommodate three variables of interest instead of two. As such, three correlation parameters are estimated from the data: rx,y, rx,z, and ry,z.2
  • The model has a new parameter, \theta , which denotes the partial correlation of interest in the current example, that is rx,y|z

The model assumes—as does the one by Lee & Wagenmakers—that the data are modelled as draws from a multivariate normal distribution. The parameters of this distribution are the means of each of the three variables (denoted \mu_{1}, \mu_{2},\mu_{3},  ) and their standard deviations (\sigma_{1}, \sigma_{2},\sigma_{3},  ), as well as the correlation coefficients that link them (r_{1,2}, r_{1,3}, r_{2,3},  ). I use broad, rather uninformative priors (which of course can be tweaked later if one wishes).

The parameter of interest, \theta , inherits the distributions from the model parameters pertaining to the correlation coefficients (r_{1,2}, r_{1,3}, r_{2,3},  ), and from them generates a new distribution of the partial correlation parameter (according to the first equation in this post). The distribution of interest for inference is now this new distribution pertaining to the partial correlation parameter.


Recall that the frequentist estimate of the partial correlation coefficient was rx,y|z(74) = 0.19. Below is a density plot of the posterior distribution of the \theta parameter from the Bayesian graphical model above3.

The mode of this posterior distribution was 0.205, with a 95% Highest-Density Interval spanning from -0.03 to 0.39. Note that whilst the modal estimate of the partial correlation parameter was close to that of the frequentist analysis, the Bayesian parameter estimation provides much more information, in particular regarding the uncertainty of this estimate.


I am not sure if this implementation is correct, so I advise using it with caution. But, it proved an interesting exercise, and I am not aware of a current implementation of this.


### initial set up
rm(list = ls())

# set working directory
setwd("D:/Work/Other/Blog_YouTube code/Blog/Bayesian Partial Correlation")

# load libraries

# set seed for reproducible code

### define functions

# function to simulate partially correlated data
get_data <- function(n){
  x <- rnorm(n, 0, 1)
  y <- .5 * x + rnorm(n, 0, 1)
  z <- .3 * x + .6 * y + rnorm(n, 0, 1)
  data <- data.frame(x = x, y = y, z = z)

### get data & conduct frequentist analysis

# generate the data
n <- 75
x <- get_data(n)

## plot the data with linear model fits
op = par(cex.main = 1.5, mar = c(5,6,4,5) + 0.1, mgp = c(3.5, 1,0), 
         cex.lab = 1.5, font.lab = 2, cex.axis = 1.3, bty = "n", las = 1)

# do pairs plot
pdf("pairs.pdf", width = 6, height = 6)
pairs(x, upper.panel = NULL, pch = 19)

# do correlation plot
pdf("correlation.pdf", width = 6, height = 6)
plot(x$y, x$x, pch = 17, ylab = "", xlab = "")

# model not controlling for z
mod_1 <- lm(x$x ~ x$y)
abline(a = mod_1$coefficients[1], b = mod_1$coefficients[2], lwd = 3, lty = 1, 
       col = "red")

# model controlling for z
mod_2 <- lm(x$x ~ x$y + x$z)
abline(a = mod_2$coefficients[1], b = mod_2$coefficients[2], lwd = 3, lty = 1, 
       col = "blue")
legend("bottomright", c("Regular", "Partial"), lty = c(1, 1), bty = "n", 
       lwd = 3, col = c("red","blue"), cex = 1.5)

# get the frequentist estimate of correlation & partial correlation
# note that the correlation between x and y is no longer significant when
# controlling for z
freq_r <- rcorr(as.matrix(x))
freq_partial_r <- pcor(x)

### Conduct Bayesian parameter estimation

# declare the JAGS model code
model_code <- "
model {

# data
for(i in 1:n){
x[i, 1:3] ~ dmnorm.vcov(mu[], TI[,])

# priors
mu[1] ~ dnorm(0, .001)
mu[2] ~ dnorm(0, .001)
mu[3] ~ dnorm(0, .001)

lambda[1] ~ dgamma(.001, .001)
lambda[2] ~ dgamma(.001, .001)
lambda[3] ~ dgamma(.001, .001)

r_xy ~ dunif(-1, 1)
r_xz ~ dunif(-1, 1)
r_yz ~ dunif(-1, 1)

# reparameterisation
sigma[1] <- 1/sqrt(lambda[1])
sigma[2] <- 1/sqrt(lambda[2])
sigma[3] <- 1/sqrt(lambda[3])

T[1, 1] <- 1 / lambda[1]
T[1, 2] <- r_xy * sigma[1] * sigma[2]
T[1, 3] <- r_xz * sigma[1] * sigma[3]

T[2, 1] <- r_xy * sigma[1] * sigma[2]
T[2, 2] <- 1 / lambda[2]
T[2, 3] <- r_yz * sigma[2] * sigma[3]

T[3, 1] <- r_xz * sigma[1] * sigma[3]
T[3, 2] <- r_yz * sigma[2] * sigma[3]
T[3, 3] <- 1 / lambda[3]

TI[1:3, 1:3] <- T[1:3, 1:3]

# partial correlation calculation
num <- r_xy - (r_xz * r_yz)
denom <- sqrt(1 - pow(r_xz, 2)) * sqrt(1 - pow(r_yz, 2))
partial <- num/denom

# model details
jags_info <- list("x", "n")
parameters <- c("r_xy", "r_xz", "r_yz", "mu", "sigma", "partial")

# fit the model
sample <- jags(jags_info, inits = NULL, parameters, 
               model.file = textConnection(model_code), n.chains = 1, 
               n.iter = 10000, n.burnin = 500, n.thin = 5, DIC = F)

# look at the overview of the parameter estimates

# extract the posterior samples of the partial correlation (4th column) &
# calculate the 95% HDI
posterior <- sample$BUGSoutput$sims.matrix[, 4]
sample_mcmc <- as.mcmc(posterior)
hdi <- HPDinterval(sample_mcmc)

###  plot

# do some preparation for plotting by finding the mode of the posterior
dens <- density(posterior)
posterior_mode <- dens$x[which.max(dens$y)]

# do the plot
pdf("bayesian_estimate.pdf", width = 6, height = 6)
plot(dens, xlim = c(-1, 1), ylim = c(0, max(dens$y) + 0.55), 
     main = "", xlab = "Partial Correlation Estimate", lwd = 2)

# add the mode of the sample & the HDI etc.
lines(x=c(posterior_mode, posterior_mode), y=c(2.5, 3.8), lty = 2, lwd = 2, 
      col = "red")
text(x= posterior_mode, y = max(dens$y) + 0.5, paste("Posterior mode =",
                                         round(posterior_mode, 3), sep = " "), 
     cex = 1.2, col = "red")
lines(x = c(hdi[1],hdi[1]), y = c(0,0.2), lwd = 2, col = "red")
lines(x = c(hdi[2],hdi[2]), y = c(0,0.2), lwd = 2, col = "red")
lines(x = c(hdi[1],hdi[2]), y = c(0.1,0.1), lwd = 2, col = "red")
text(x = (hdi[1] + hdi[2]) / 2, y = 0.325, "95% HDI", cex = 1.2, col = "red")

  1. Note that there is a paper from Wetzel & Wangenmakers (2012) which demonstrates the calculation of Bayes factors for correlation and partial correlation using summary statistics (i.e., Pearson’s r and the degrees of freedom). Note also (as I say in the main post) that I didn’t search that hard for a solution to this problem as I was keen to make my own method. So, there is probably a better way of doing this that is already available. 
  2. Note that in the model of Lee & Wagenmakers with just two variables, one must take the inverse of the variance–covariance matrix when passing it to the dmnorm function in JAGS. With three or more variables, there are issues when taking the inverse of the matrix because it is then not positive definite. (See this link for the help I received from the JAGS owner on this issue) The solution involves using JAGS versions newer than 4.3.0, and using the dmnorm.cov function instead. 
  3. As this is just a toy example, I kept the model-fit time as quick as possible. I generated 10,000 samples from the posterior distributions, treating the first 500 as burn-in samples. The thinning rate was set to 5, and just one chain was used. Note that this is not optimal, but the fit time was quite slow for the model. 

Low Power & Effect Sizes

Yesterday I posted the following tweet which has since turned out to be my most popular tweet EVER with hundreds of retweets and “likes” in 24 hours:

My motivation for the tweet was quite straightforward. I have recently been emailing academics in my department every week with different topics in an attempt to raise awareness of topics associated with increasing the information value of the research we are conducting as a department. This week’s topic was “Power”. In my email—in which I included a copy of Button et al.’s (2013) excellent paper on low-power in the neurosciences—I mentioned in passing that power is not just an issue for statistical significance. I have heard from people before that low power is only an issue when interpreting null results, and that if a study produces a significant outcome, then power is not an issue.

To pre-empt this response to my encouragement to increase the power of our studies, I said in my email: “Studies with low power and significant effects have been shown to over-estimate effect sizes, meaning your low-powered study—although significant—is not giving you precision.”

As soon as I sent the email, I realised that I couldn’t recall ever reading a study that had demonstrated this. Now, I knew that such a study (or studies) would have been conducted, but I realised that I had never actually read it myself. It turns out that such studies have indeed been conducted before, as people helpfully pointed out to me on Twitter in response to my tweet:

As I was unaware of these studies—plus it was a Sunday, and I was a little bored—I thought instead of doing a literature search I would code a simulation demonstrating the inflation of effect sizes with low-powered, significant, studies the results of which I emailed to my department to demonstrate that what I had said was indeed the case. Then I thought, “Well, I haven’t tweeted much this year, so why not put it on Twitter, too.”

The incredible engagement I have had with this tweet—I propose—is due to this being a rather under-appreciated fact. Indeed, I “knew” that low-powered studies over-estimate effect sizes, but I didn’t KNOW it in the sense that I had seen hard evidence for it.


Details of Simulation

Because my tweet was made in passing, I didn’t explain in much detail about the stimulation implementation. I discuss this here in case others want to extend the simulation in some way.

The effect size of interest is a measure of correlation between two measures. I arbitrarily chose IQ (mean = 100, SD  = 20) and response time (mean = 600ms, SD = 80ms). I fixed the “true” effect size to be r = 0.3. It turns out that to obtain 80% power for an r=0.3 requires 85 subjects. In my simulation, I wanted to explore a wide range of sample sizes, so chose the set 10, 20, 30, 50, 85, 170, 500, and 1000.

For each sample size—N—I simulated 1,000 “studies”. For each simulated study, the following procedure occurred:

  • Sample N draws from a multivariate normal distribution with the means and SD for IQ and RT as above and a population correlation coefficient of 0.3
  • Conduct a Pearson’s correlation between the two samples
  • If the correlation was significant, store the observed correlation coefficient in a new data frame
  • If the correlation was not significant, move on without storing anything
  • After 1,000 studies are completed, plot a boxplot of the observed effect sizes for N

The result was the image in the tweet.


Many limitations exist to this simulation, and I point interested readers to the material cited above in others’ tweets for a more formal solution. I didn’t intend for this to be a rigorous test, so it shouldn’t be taken too seriously; it was more for my own curiosity and also to provide a graphical image I could send to my colleagues at Keele to show the imprecision of effect sizes with low power. The particular outcomes are likely sensitive to my choice of means, SDs, r, etc. So, don’t generalise the specifics of this simulation, but maybe code your own tailored to your study of interest.

For me this was a bit of fun. Ten minutes of coding was time well spent on Sunday!

Do Olympic Hosts Have a “Home-Field” Advantage?

My wife questioned in passing yesterday whether summer Olympic hosts have a home-field advantage; that is, do the hosts generally win more medals in their hosting year than in their non-hosting years?

That a home-field advantage exists in many team sports is generally not disputed—see for example this excellent blog post by the Freakonomics team. But is this true for (generally) individual sports like the Olympics? Most of us Brits recall our amazing—and quite unusual—3rd place finish when we hosted the event in 2012, so anecdotally I can understand why suspicion of a home-field advantage exists.

But is it real? I am quite sure there is an answer to this question on the web somewhere, but I wanted to take this opportunity to try and find an answer myself. Basically, I saw this as an excuse to learn some web-scraping techniques using R statistics.

The Data

Wikipedia holds unique pages for each Summer Olympic games. On these pages are medal tables tallying the number of Gold, Silver, and Bronze each competing nation won that year, as well as the Total Medals. So, I wrote some code in R that visits each of these pages in turn, finds the relevant html table containing the medal counts, and extracts it into my work-space. I only looked at post-2nd-world-war games.

My idea was to plot all the medals won for each host nation for all years they have appeared at the games. I was interested in whether the total number of medals that the host won in their host-year was more than their average (mean) across all the games the host had appeared. If there is some sort of home-field advantage, generally we would expect their host-year to be one of their better years, certainly above their average Olympic performance.

The Results

Below is a plot of the results. The header of each plot shows who the host was that year, and the data in each plot shows the total number of medals won by the host in all of the games they have appeared in. To help interpretation of the results, for each plot, the vertical blue line shows the year that nation hosted the games, and the horizontal red line shows that nation’s mean performance across all their games.



I would take this data as providing some evidence that nations generally perform better when they are hosting the games. 11 out of 16 nations had their best year the year they hosted the games. All nations performed above average the year they hosted the games (although maybe Canada, 1976, just missed out).

The Real Conclusion (And the Code)

Coding in R is fun, and I look for any excuse to work on new projects. This is my first attempt at doing web scraping, and it wasn’t as painful as I thought it would be. Below is the code, relying a lot on the rvest R package which I highly recommend; check out this nice introduction to using it.

The code I wrote is below. It’s certainly not optimal, and likely full of errors, but I hope someone finds it of use. Although I tried to automate every aspect of the analysis, some aspects had to be manually altered (for example to match “Soviet Union” data with “Russia” data).


# clear workspace
rm(list = ls())

# set working directory
setwd("D:/Work/Blog_YouTube code/Blog/Olympic Medals")

# load relevant packages

# suppress warnings
options(warn = -1)

### get a list of all of the host nations

# set the url and extract html elements
host_url <- "http://www.topendsports.com/events/summer/hosts/list.htm"
temp <- host_url %>%
  html %>%

# extract the relevant table
hosts <- data.frame(html_table(temp[1]))

# remove the years that the Olympics were not held
hosts <- hosts[!grepl("not held", hosts$Host.City..Country), ]

# remove the cities from the host column
countries <- hosts$Host.City..Country
countries <- gsub(".*,", "", countries)
hosts$Host.City..Country <- countries

# remove the Olympics that are ongoing (or are yet to occur) and generally
# tidy the table up. Also, only select post-1948 games.
hosts <- hosts %>%
  select(-Olympiad) %>%
  select(year = Year, host = Host.City..Country) %>%
  filter(year < 2016 & year > 1948)

# remove white space from the names
hosts$host <- gsub(" ", "", hosts$host, fixed = TRUE)

# change host England to Great Britain. 
# change SouthKorea to South Korea
# change USSR to Russia
hosts$host <- gsub("England", "Great Britain", hosts$host, fixed = TRUE)
hosts$host <- gsub("SouthKorea", "South Korea", hosts$host, fixed = TRUE)
hosts$host <- gsub("USSR", "Russia", hosts$host, fixed = TRUE)

### get the medal tables for each year and store them in one list

# get a vector of all years
years <- hosts$year

# create a list to store the medal tables
medal_tables <- list()

# loop over each year and retrieve the data from Wikipedia
for(i in 1:length(years)){
  # what is the current year?
  curr_year <- years[i]
  # construct the relevant URL to the Wikipedia page
  url <- paste("https://en.wikipedia.org/wiki/", curr_year, 
               "_Summer_Olympics_medal_table", sep = "")
  # retrieve the data from this page
  temp <- url %>%
    html %>%
  # find the html table's position. The medal table is in a "sortable" Wiki 
  # table, so we search for this term and return its position in the list
  position <- grep("sortable", temp)
  # get the medal table. Add a new column storing the year
  medals <- data.frame(html_table(temp[position], fill = TRUE))
  medals <- medals %>%
    mutate(Year = curr_year)
  # change the names of the "Nation" column, as this is not consistent between
  # games tables
  colnames(medals)[2] <- "Nation"
  # remove the weird symbols from the html file (Â)
  nations <- medals$Nation
  nations <- gsub("[^\\x{00}-\\x{7f}]", "", nations, perl = TRUE)
  # we need to change "Soviet Union" to USSR for consistency
  nations <- gsub("Soviet Union(URS)", "Russia(RUS)", nations, fixed = TRUE)
  # also change West & East Germany to "Germany"
  nations <- gsub("East Germany(GDR)", "Germany(GER)", nations, fixed = TRUE)
  nations <- gsub("West Germany(FRG)", "Germany(GER)", nations, fixed = TRUE)
  medals$Nation <- nations

  # save the medal table and move to the next games
  medal_tables[[i]] <- medals


### loop over each host, then find how many medals they won in each games and
### store it in data frame

# initialise the data frame
final_data <- data.frame(hosts)
final_data[, as.character(years)] <- 0

for(i in 1:length(hosts$host)){
  # get the current host
  curr_host <- hosts$host[i]

  # loop over all years, find the number of medals won by the current host, 
  # and store it in final_data frame
  for(j in 1:length(years)){
    # what is the current year?
    curr_year <- years[j]
    # get the medal table for the current year
    curr_medals <- medal_tables[[j]]
    # get the row for the current host if it is present
    curr_medals <- curr_medals %>%
      filter(str_detect(Nation, curr_host))
    # collate the number of medals won if there is data
    if(nrow(curr_medals) > 0){
      final_data[i, j + 2] <- sum(curr_medals$Total)
    } else
      final_data[i, j + 2] <- 0
  } # end of each year loop
} # end of each host loop

### now do some plotting
pdf("medals.pdf", width = 12, height = 12)

# change the layout of the plotting window
par(mfrow = c(4, 4))

# loop over each hosting nation
for(i in 1:nrow(final_data)){
  # get the current host's data for all years
  host_data <- as.numeric(final_data[i, 3:ncol(final_data)])
  # what is their mean number of medals won?
  host_mean <- mean(host_data)
  # plot the data!
  plot(years, host_data, xlab = "Year", ylab = "Number of Medals", pch = 19, 
       type = "b", lwd = 2, 
       main = paste(hosts$host[i], "–", years[i], sep = ""))
  abline(v = final_data$year[i], lty = "dashed", col = "blue", lwd = 1.5)
  abline(h = host_mean, lty = "dashed", col = "red", lwd = 1.5)


Solution to #BarBarPlots in R

I came across an interesting project the other day which is calling for a reconsideration of the use of bar plots (#barbarplots), with the lovely tag-line “Friends don’t let friends make bar plots!”. The project elegantly outlines convincing reasons why bar plots can be misleading, and have successfully funded a campaign to “…increase awareness of the limitations that bar plots have and the need for clear and complete data visualization”.

In this post, I want to show the limitations of bar plots that these scientists have highlighted. Then, I provide a solution to these limitations for researchers who want to continue using bar plots that can easily be cobbled together using R-statistics (with the ggplot2 package).

The Data

Say you are a researcher who collects some data (it doesn’t matter on what) from two independent groups and you are interested in whether there is a difference between them. Most researchers would maybe calculate the mean and standard error of each group to describe the data. Then the researcher might plot the data using a bar plot, together with error bars representing the standard error. To provide an inferential test on whether a difference exists, the researcher would usually conduct an independent samples t-test.

Let’s provide some example data for two conditions:

  • condition A (n = 100): mean of 200.17, a median of 196.43, and a standard error of 6.12
  • condition B (n = 100): mean of 200.11, a median of 197.87, and a standard error of 7.19

Here is the bar plot:


Pretty similar, right? The researcher sees that there is little evidence for a difference; to test this inferentially they conduct an independent samples t-test, with the outcome t(198) = 0.007, p = .995, Cohen’s d < 0.001. The researcher concludes there is no difference between the two groups.

The Problem

The problem raised by the #barbarplot campaign is that bar plots are a poor summary of the distribution of data. The bar plot above suggests there is no difference between the two groups, but the two groups are different! How do I know they are different? I simulated the data. What the bar plot hides is the shape of the underlying distribution of each data set. Below I present a density plot (basically a smoothed histogram) of the same data as above:


Now we can see that the two groups are clearly different! Condition A is a normal distribution, but condition B is bi-modal. The bar plot doesn’t capture this difference.

The Solution

Density plots are a nice solution to presenting the distribution of data, but can get really messy when there are multiple conditions (imagine the above density plot but with 4 or more overlapping conditions). Plus, researchers are used to looking at bar plots, so there is something to be said about continuing their use (especially for factorial designs). But how do we get around the problem highlighted by the #barbarplot campaign?

One solution is to plot the bar plots as usual, but to overlay the bar plot with individual data points. Doing this allows the reader to see the estimates of central tendency (i.e., to interpret the bar plot as usual), whilst at the same time allowing the reader to see the spread of data in each condition. This sounds tricky to do (and it probably is if you are still using Excel; yes, I’m talking to you!), but it’s simple if you’re using R.

Below is the above data plotted as a combined bar and point plot. As you can see, the difference in distribution is now immediately apparent, whilst retaining the advantages of a familiar bar plot. Everyone wins!



R Code

Below is the R code for the combined plot. This includes some code that generates the artificial data used in this example.

# load required packages

#--- Generate artificial data

# set random seed so example is reproducible

# generate condition A
condition <- rep("condition_A", 100)
dv_A <- rnorm(100, 200, 60)
condition_A <- data.frame(condition, dv = dv_A)

# generate condition B
condition <- rep("condition_B", 100)
dv_B <- c(rnorm(50, 130, 10), rnorm(50, 270, 10))
condition_B <- data.frame(condition, dv = dv_B)

# put all in one data frame
raw_data <- rbind(condition_A, condition_B)

# calculate sumary statistics
data_summary <- raw_data %>%
  group_by(condition) %>%
  summarise(mean = mean(dv), 
            median = median(dv),
            se = (sd(dv)) / sqrt(length(dv)))

#--- Do the "combined" bar plot
p2 <- ggplot()

# first draw the bar plot
p2 <- p2 + geom_bar(data = data_summary,
                    aes(y = mean,x = condition,
                        ymin = mean - se,
                        ymax = mean + se), fill = "darkgrey",
                    stat="identity", width=0.4)

# draw the error bars on the plot
p2 <- p2 + geom_errorbar(data = data_summary,
                         aes(y = mean, x = condition,
                             ymin = mean - se,
                             ymax = mean + se), stat = "identity",
                         width = 0.1, size = 1)

# now draw the points on the plot
p2 <- p2 + geom_point(data = raw_data, aes(y = dv, x = condition),
                      size = 3, alpha = 0.3,
                      position = position_jitter(width = 0.3, height = 0.1))

# scale and rename the axes, and make font size a bit bigger
p2 <- p2 + coord_cartesian(ylim = c(50, 400))
p2 <- p2 + scale_x_discrete(name = "Condition") +
  scale_y_continuous(name = "DV")

p2 <- p2 + theme(axis.text = element_text(size = 12),
                 axis.title = element_text(size = 14,face = "bold"))

# view the plot


(Pesky?) Priors

When I tell people I am learning Bayesian statistics, I tend to get one of two responses: either people look at me blankly—“What’s Bayesian statistics?”—or I get scorned for using such “loose” methods—“Bayesian analysis is too subjective!”1. This latter “concern” arises due to (what I believe to be a misunderstanding of) the prior: Bayesian analysis requires one state what one’s prior belief is about a certain effect, and then combine this with the data observed (i.e., the likelihood) to update one’s belief (the posterior).

On the face of it, it might seem odd for a scientific method to include “subjectivity” in its analysis. I certainly had this doubt when I first started learning it. (And, in order to be honest with myself, I still struggle with it sometimes.) But, the more I read, the more I think this concern is not warranted, as the prior is not really “subjectivity” in the strictest sense of the word at all: it is based on our current understanding of the effect we are interested in, which in turn is (often) based on data we have seen before. Yes, sometimes the prior can be a guess if we2 have no other information to go on, but we would express the uncertainty of a belief in the prior itself.

The more I understand Bayesian statistics, the more I appreciate the prior is essential. One under-stated side-effect of having priors is that it can protect you from dubious findings. For example, I have a very strong prior against UFO predictions; therefore, you are going to have to present me with a lot more evidence than some shaky video footage to convince me otherwise. You would not have to provide me with much evidence, however, if you claimed to have roast beef last night. Extraordinary claims require extraordinary evidence.

But, during my more sceptical hours, I often succumbed to the the-prior-is-nothing-but-subjectivity-poisoning-your-analysis story. However, I now believe that even if one is sceptical of the use of a prior, there are a few things to note:

  • If you are concerned your prior is wrong and is influencing your inferences, just collect more data: A poorly-specified prior will be washed away with sufficient data.

  • The prior isn’t (really) subjective because it would have to be justified to a sceptical audience. This requires (I suggest) plotting what the prior looks like so readers can familiarise themselves with your prior. Is it really subjective if I show you what my prior looks like and I can justify it?

  • Related to the above, the effect of the prior can be investigated using robustness checks, where one plots the posterior distribution based on a range of (plausible) prior values. If your conclusions don’t depend upon the exact prior used, what’s the problem?

  • Priors are not fixed. Once you have collected some data and have a posterior belief, if you wish to examine the effect further you can (and should) use the posterior from the previous study as your prior for the next study.

These are the points I mention to anti-Bayesians I encounter. In this blog I just wanted to skip over some of these with examples. This is selfish; it’s not really for your education (there really are better educators out there: My recommendation is Alex Etz’s excellent “Understanding Bayes” series, from where this blog post takes much inspiration!). I just want somewhere with all of this written down so next time someone criticises my interest in Bayesian analysis I can just reply: “Read my blog!”. (Please do inform me of any errors/misconceptions by leaving a comment!)

As some readers might not be massively familiar with these issues, I try to highlight some of the characteristics of the prior below. In all of these examples, I will use the standard Bayesian “introductory tool” of assessing the degree of bias in a coin by observing a series of flips.

A Fair Coin

If a coin is unbiased, it should produce roughly equal heads and tails. However, often we don’t know whether a coin is biased or not. We wish to estimate the bias in the coin (denoted theta) by collecting some data (i.e., by flipping the coin); a fair coin has a theta = 0.5. Based on this data, we can calculate the likelihood of various theta values. Below is the likelihood function for a fair coin.


In this example, we flipped the coin 100 times, and observed 50 heads and 50 tails. Note how the peak of the likelihood is centered on theta = 0.5. A biased coin would have a true theta not equal to 0.5; theta closer to zero would reflect a bias towards tails, and a theta closer to 1 would reflect a bias towards heads. The animation below demonstrates how the likelihood changes as the number of observed heads (out of 100 flips) increases:


So, the likelihood contains the information provided by our sample about the true value for theta.

The Prior

Before collecting data, Bayesian analysts would specify what their prior belief was about theta. Below I present various priors a Bayesian may have using the beta distribution (which has two parameters: a and b):betas-1

The upper left plot reflects a prior belief that the coin is fair (i.e., the peak of the distribution is centered over theta = 0.5); however, there is some uncertainty in this prior as the distribution has some spread. The upper right plot reflects total uncertainty in a prior belief: that is, the prior holds that any value of theta is likely. The lower two plots reflect prior beliefs that the coin is biased. Maybe the researcher had obtained the coin from a known con-artist. The lower left plot reflects a prior for a biased coin, but uncertainty about which side the coin is biased towards (that is, it could be biased heads or tails); the lower right plot reflects a prior that the coin is biased towards heads.

The effect of the prior

I stated above that one of the benefits of the prior is that it allows protection (somewhat) from spurious findings. If I have a really strong prior belief that the coin is fair, 9/10 heads isn’t going to be all that convincing evidence that it is not fair. However, if I have a weak prior that the coin is fair, then I will be quite convinced by the data.

This is illustrated below. Both priors below reflect the belief that the coin is fair; what differs between the two is the strength in this belief. The prior on the left is quite a weak belief, as the distribution (although peaked at 0.5) is quite spread out. The prior on the right is a stronger belief that the coin is fair.

In both cases, the likelihood is the result of observing 9/10 heads.


You can see that when the prior is a weak belief, the posterior is very similar to the likelihood; that is, the posterior belief is almost entirely dictated by the data. However, when we have a strong prior belief, our beliefs are not altered much by observing just 9/10 heads.

Now, I imagine that this is the anti-Bayesian’s point: “Even with clear data you haven’t changed your mind.” True. Is this a negative? Well, imagine instead this study was assessing the existence of UFOs rather than simple coin flips. If I showed you 9 YouTube videos of UFO “evidence”, and 1 video showing little (if any) evidence, would you be convinced of UFOs? I doubt it. You were the right-hand plot in this case. (I know, I know, the theta distribution doesn’t make sense in this case, but ignore that!)

What if the prior is wrong?

Worried that your prior is wrong3, or that you cannot justify it completely? Throw more data at it. (When is this ever a bad idea?) Below are the same priors, but now we flip the coin 1,000 times and observe 900 heads. (Note, the proportion heads is the same in the previous example.) Now, even our strong prior belief has to be updated considerably based on this data. With more data, even mis-specified priors do not affect inference.


To get an idea of how sample size influences the effect of the prior on the posterior, I created the below gif animation. In it, we have a relatively strong (although not insanely so) prior belief that the coin is biased “heads”. Then, we start flipping the coin, and update the posterior after each flip. In fact, this coin is fair, so our prior is not in accord with (unobservable) “reality”. As flips increases, though, our posterior starts to match the likelihood in the data. So, “wrong” priors aren’t really a problem. Just throw more data at it.


“Today’s posterior is tomorrow’s prior” — Lindley (1970)

After collecting some data and updating your prior, you now have a posterior belief of something. If you wish to collect more data, you do not use your original prior (because it no longer reflects your belief), but you instead use the posterior from your previous study as the prior for your current one. Then, you collect some data, update your priors into your posteriors…and so on.

In this sense, Bayesian analysis is ultimately “self-correcting”: as you collect more and more data, even horrendously-specified priors won’t matter.

In the example below, we have a fairly-loose idea that the coin is fair—i.e., theta = 0.5. We flip a coin 20 times, and observe 18 heads. Then we update to our posterior, which suggests the true value for theta is about 0.7 ish. But then we wish to run a second “study”; we use the posterior from study 1 as our prior for study 2. We again observe 18 heads out of 20 flips, and update accordingly.



One of the nicest things about Bayesian analysis is that the way our beliefs should be updated in the face of incoming data is clearly (and logically) specified. Many peoples’ concerns surround the prior. I hope I have shed some light on why I do not consider this to be a problem. Even if the prior isn’t something that should be “overcome” with lots of data, it is reassuring to know for the anti–Bayesian that with sufficient data, it doesn’t really matter much.

So, stop whining about Bayesian analysis, and go collect more data. Always, more data.

Click here for the R code for this post

  1. Occasionally (althought his is happening more and more) I get a slow, wise, agreeing nod. I like those. 
  2. I really wanted to avoid the term “we””, as it implies I am part of the “in-group”: those experts of Bayesian analysis who truly appreciate all of its beauty, and are able to apply it to all of their experimental data. I most certainly do not fall into this camp; but I am trying. 
  3. Technically, it cannot be “wrong” because it is your belief. If that belief is justifiable, then it’s all-game. You may have to update your prior though if considerable data contradict it. But, bear with me. 

Happy (Geeky!) December!

I read a blog this morning introducing a new R package that allows users to include emojis into their ggplot2 plots. As it is the 1st of December, I thought I would try this new package out with a winter-themed plot. Enjoy!


R Code

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_emoji(emoji = "2744") + add_emoji(emoji = "26c4")
view raw snowman.R hosted with ❤ by GitHub

Animating Robustness-Check of Bayes Factor Priors

Today I submitted a paper which contained some Bayesian analysis using Bayes factors (a default Bayesian t-test). As the test used a default prior on the effect size (a Cauchy distribution centred on zero with rate [r] = 0.707), I wanted to appease any reviewer concern that the final Bayes factor reported was some peculiarity of the prior rather than due to the data.

So, I performed a standard robustness check on the prior where I plot the resulting Bayes factor from a wide range of prior value (in this case, across a wide range of Cauchy rate parameters). The plot below is the result:paperRobust

The plot shows the Bayes factor with the default prior as the black dot; the line represents how the Bayes factor changes as a function of the Cauchy prior width. This analysis suggests the final conclusion is not a peculiarity of the prior, and as such the reported Bayes factor is quite robust.

What this plot doesn’t show, though, is what the prior distribution looks like as the rate (r) changes. Ideally, I wanted to show what the Cauchy distribution looks like across the entire range of r values explored in the plot above; this helps me (and I guess other people) understand how the prior is influencing the Bayes factor. Visualisation is very important sometimes.

So, I knocked up some R code which plots the robustness check above, but plots next to it the density of the Cauchy distribution with the rate parameter currently being explored. The animation cycles through increasing values for rate r, and shows the density for this rate on the left, and the Bayes factor for my data with this prior on the right. For me, this animation provides a nice insight to how the Bayes factor is being influenced by the prior.


R Code

The code for generating the individual plots is below. Note that all of these plots then need to be fed into your favourite gif-making device! (I used http://gifmaker.me/).

### plots of prior robustness check
# gif created from PNG plots using http://gifmaker.me/
# clear R's memory
rm(list = ls())
# load the Bayes factor package
# set working directory (will be used to save files here, so make sure
# this is where you want to save your plots!)
setwd <- "D:/Work//Blog_YouTube code//Blog/Prior Robust Visualisation/plots"
### declare some variables for the analysis
# what is the t-value for the data?
tVal <- 3.098
# how many points in the prior should be explored?
nPoints <- 1000
# what Cauchy rates should be explored?
cauchyRates <- seq(from = 0.01, to = 1.5, length.out = nPoints)
# what effect sizes should be plotted?
effSize <- seq(from = -2, to = 2, length.out = nPoints)
# get the Bayes factor for each prior value
bayesFactors <- sapply(cauchyRates, function(x)
exp(ttest.tstat(t = tVal, n1 = 76, rscale = x)[['bf']]))
### do the plotting
# how many plots do we want to produce?
nPlots <- 50
plotWidth <- round(seq(from = 1, to = nPoints, length.out = nPlots), 0)
# loop over each plot
for(i in plotWidth){
# set up the file
currFile <- paste(getwd(), "/plot_", i, ".png", sep = "")
# initiate the png file
png(file = currFile, width = 1200, height = 1000, res = 200)
# change the plotting window so plots appear side-by-side
par(mfrow = c(1, 2))
# do the prior density plot
d <- dcauchy(effSize, scale = cauchyRates[i])
plot(effSize, d, type = "l", ylim = c(0, 5), xlim = c(-2, 2),
ylab = "Density", xlab = "Effect Size (d)", lwd = 2, col = "gray48",
main = paste("Rate (r) = ", round(cauchyRates[i], 3), sep = ""))
# do the Bayes factor plot
plot(cauchyRates, bayesFactors, type = "l", lwd = 2, col = "gray48",
ylim = c(0, max(bayesFactors)), xaxt = "n",
xlab = "Cauchy Prior Width (r)", ylab = "Bayes Factor (10)")
abline(h = 0, lwd = 1)
abline(h = 6, col = "black", lty = 2, lwd = 2)
axis(1, at = seq(0, 1.5, 0.25))
# add the BF at the default Cauchy point
points(0.707, 9.97, col = "black", cex = 1.5, pch = 21, bg = "red")
# add the BF for the Cauchy prior currently being plotted
points(cauchyRates[i], bayesFactors[i], col = "black", pch = 21, cex = 1.3,
bg = "cyan")
# add legend
legend(x = 0.25, y = 3, legend = c("r = 0.707", paste("r = ",
round(cauchyRates[i], 3),
sep = "")),
pch = c(21, 21), lty = c(NA, NA), lwd = c(NA, NA), pt.cex = c(1, 1),
col = c("black", "black"), pt.bg = c("red", "cyan"), bty = "n")
# save the current plot

Recruitment Order & Sequential Bayes Factors

One advantage of using Bayes factors (or any other form of Bayesian analysis) is the ability to engage in optional stopping. That is, one can collect some data, perform the critical Bayesian test, and stop data collection once a pre-defined criterion has been obtained (e.g., until “strong” evidence has been found in support of one hypothesis over another). If the criterion has not been met, one can resume data collection until it has.

Optional stopping is problematic when using p-values, but is “No problem for Bayesians!” (Rouder, 2014; see also this blog post from Kruschke). A recent paper in press at Psychological Methods (Shoenbrodt et al., in press) shows you how to use optional stopping with Bayes factors—so-called “sequential Bayes factors” (SBF).

The figure below a recent set of data from my lab where we used SBF for optional stopping. Following the advice of the SBF paper, we set our stopping criteria as N > 20 and BF10 > 6 (evidence in favour of the alternative) or BF10 < 1/6 (evidence in favour of the null); these criteria are shown as horizontal dashed lines. The circles represent the progression of the Bayes factor (shown in log scale) as the sample size increased. We stopped at 76 participants, with “strong” evidence supporting the alternative hypothesis.bf

One thing troubled me when I was writing these results up: I could well imagine a skeptical reviewer noticing that the Bayes factor seemed to be in support of the null between sample sizes of 30 and 40, although the criterion for stopping was not reached. A reviewer might wonder, therefore, whether had some of the later participants who showed no effect (see e.g., participants 54–59) been recruited earlier, would we have stopped data collection and found evidence in favour of the null? Put another way, how robust was our final result to the order in which our sample were recruited to the experiment?

Assessing the Effect of Recruitment Order

I was not aware of any formal way to assess this question, so for the purposes of the paper I was writing I knocked up a quick simulation. I randomised the recruitment order of our sample, and plotted the sequential Bayes factor for this “new” recruitment order. I performed this 50 times, plotting the trajectory of the SBF. I reasoned that if our conclusion was robust against recruitment order, the stopping rule in favour of the null should be rarely (if ever) met. The results of this simulation are below. As can be seen, the stopping criterion for the null was never met, suggesting our results were robust to recruitment order.



Recruitment Order Simulation

Then, I became curious: How robust are SBFs to recruitment order when we know what the true effect is? I simulated a set of 100 participants for an effect known to be small (d = 0.3), and plotted the SBF as “participants” were recruited. I tried different random seeds until I found a pattern I was interested in: I wanted a set of data where the SBF comes very close to the stopping rule in favour of the null, even though we know the alternative is true. It took me a few tries, but eventually I got the plot below.bfSim

Note the dip toward the null criterion is even more extreme than in my experiment. Thus, this set of data seemed perfect to test my method for assessing the robustness of the final conclusion (“strong evidence for alternative”) against recruitment order. I followed the same procedure as before, simulating 50 random recruitment orders, and plotted the results below.orderSim

The null criterion is only met in one simulated “experiment”. This seems to be very pleasing, at least for the paper I am due to submit soon: the SBF (at least in these two small examples) seem robust against recruitment order.

R Code

If you would like to see the R code for the above plots, then you can download the html file for this post which includes R code.

trimr: An R Package of Response Time Trimming Methods

Response times are a dominant dependent variable in cognitive psychology (and other areas). It is common that raw response times need to undergo some trimming before being submitted to inferential analysis; this is because RTs typically suffer from outliers: a small proportions of RTs that lie at the extremes of the response time distribution and are thought to arise from processes not under investigation.

There are a wide array of response time trimming methods that I outlined in a previous post. Some are very simple to implement (such as removing all RTs slower than 2 seconds, for example). Some are intermediate in terms of difficulty of implementation (such as removing all RTs slower than 2.5 standard deviations above the mean of each participant of each condition). Some are downright tricky to implement, such as the modified recursive procedure of Van Selst & Jolicoeur (1994).

To make response time trimming simpler for the researcher, I have developed a small R package—trimr—that takes raw RTs for all participants and experimental conditions, performs any trimming method the user requires, and returns data for all participants and conditions ready for inferential testing.

Below I provide an overview of how to use trimr.


trimr is an R package that implements most commonly-used response time trimming methods, allowing the user to go from a raw data file to a finalised data file ready for inferential statistical analysis.

trimr is available from CRAN. To download it, open R and type:


To install the latest version of trimr (i.e., the development version of next release), install devtools, and install directly from GitHub by using:

# install devtools

# install trimr from GitHub

(To report any bugs in trimr—which are likely—please see my GitHub account for trimr, and click on “issues” in the top right.)

The trimming functions fall broadly into three families (together with the function names for each method implemented in trimr):

  1. Absolute Value Criterion:
    • absoluteRT
  2. Standard Deviation Criterion:
    • sdTrim
  3. Recursive / Moving Criterion:
    • nonRecursive
    • modifiedRecursive
    • hybridRecursive

Example Data

trimr ships with some example data—“exampleData”—that the user can explore the trimming functions with. This data is simulated (i.e., not real), and has data from 32 subjects. This data is from a task switching experiment, where RT and accuracy was recorded for two experimental conditions: Switch, when the task switched from the previous trial, and Repeat, when the task repeated from the previous trial.

# load the trimr package

# activate the data

# look at the top of the data
##   participant condition   rt accuracy
## 1           1    Switch 1660        1
## 2           1    Switch  913        1
## 3           1    Repeat 2312        1
## 4           1    Repeat  754        1
## 5           1    Switch 3394        1
## 6           1    Repeat  930        1

The exampleData consists of 4 columns:

  • participant: Codes the number of each participant in the experiment
  • condition: In this example, there are two experimental conditions: “Switch”, and “Repeat”.
  • rt: Logs the response time of the participant in milliseconds.
  • accuracy: Logs the accuracy of the response. 1 codes a correct response, 0 an error response.

At a minimum, users using their own data need columns with these names in their data frame they are using trimr for. The user can use RTs logged in milliseconds (as here) or in seconds (e.g., 0.657). The user can control the number of decimal places to round the trimmed data to.

Absolute Value Criterion

The absolute value criterion is the simplest of all of the trimming methods available (except of course for having no trimming). An upper- and lower-criterion is set, and any response time that falls outside of these limits are removed. The function that performs this trimming method in trimr is called absoluteRT.


In this function, the user decalares lower- and upper-criterion for RT trimming (minRT and maxRT arguments, respectively); RTs outside of these criteria are removed. Note that these criteria must be in the same unit as the RTs are logged in within the data frame being used. The function also has some other important arguments:

  • omitErrors: If the user wishes error trials to be removed from the trimming, this needs to be set to TRUE (it is set to this by default). Alternatively, some users may wish to keep error trials included. Therefore, set this argument to FALSE.
  • returnType: Here, the user can control how the data are returned. “raw” returns trial-level data after the trials with trimmed RTs are removed; “mean” returns calculated mean RT per participant per condition after trimming; “median” returns calculated median RT per participant per condition after trimming. This is set to “mean” by default.
  • digits: How many digits to round the data to after trimming? If the user has a data frame where the RTs are recorded in seconds (e.g., 0.657), this argument can be left at its default value of 3. However, if the data are logged in milliseconds, it might be best to change this argument to zero, so there are no decimal places in the rounding of RTs (e.g., 657).

In this first example, let’s trim the data using criteria of RTs less than 150 milliseconds and greater than 2,000 milliseconds, with error trials removed before trimming commences. Let’s also return the mean RTs for each condition, and round the data to zero decimal places.

# perform the trimming
trimmedData <- absoluteRT(data = exampleData, minRT = 150, 
                          maxRT = 2000, digits = 0)

# look at the top of the data
##   participant Switch Repeat
## 1           1    901    742
## 2           2   1064    999
## 3           3   1007    802
## 4           4   1000    818
## 5           5   1131    916
## 6           6   1259   1067

Note that trimr returns a data frame with each row representing each participant in the data file (logged in the participant column), and separate columns for each experimental condition in the data.

If the user wishes to recive back trial-level data, change the “returnType” argument to “raw”:

# perform the trimming
trimmedData <- absoluteRT(data = exampleData, minRT = 150, 
                          maxRT = 2000, returnType = "raw", 
                          digits = 0)

# look at the top of the data
##    participant condition   rt accuracy
## 1            1    Switch 1660        1
## 2            1    Switch  913        1
## 4            1    Repeat  754        1
## 6            1    Repeat  930        1
## 7            1    Switch 1092        1
## 11           1    Repeat  708        1

Now, the data frame returned is in the same shape as the initial data file, but rows containing trimmed RTs are removed.

Standard Deviation Criterion

This trimming method uses a standard deviation multiplier as the upper criterion for RT removal (users still need to enter a lower-bound manually). For example, this method can be used to trim all RTs 2.5 standard deviations above the mean RT. This trimming can be done per condition (e.g., 2.5 SDs above the mean of each condition), per participant (e.g., 2.5 SDs above the mean of each participant), or per condition per participant (e.g., 2.5 SDs above the mean of each participant for each condition).


In this function, the user delcares a lower-bound on RT trimming (e.g., 150 milliseconds) and an upper-bound in standard deviations. The value of standard deviation used is set by the SD argument. How this is used varies depending on the values the user passes to two important function arguments:

  • perCondition: If set to TRUE, the trimming will occur above the mean of each experimental condition in the data file.
  • perParticipant: If set to TRUE, the trimming will occur above the mean of each participant in the data file.

Note that if both are set to TRUE, the trimming will occur per participant per condition (e.g., if SD is set to 2.5, the function will trim RTs 2.5 SDs above the mean RT of each participant for each condition).

In this example, let’s trim RTs faster than 150 milliseconds, and greater than 3 SDs above the mean of each participant, and return the mean RTs:

# trim the data
trimmedData <- sdTrim(data = exampleData, minRT = 150, sd = 3, 
                      perCondition = FALSE, perParticipant = TRUE, 
                      returnType = "mean", digits = 0)

# look at the top of the data
##   participant Switch Repeat
## 1           1   1042    775
## 2           2   1136   1052
## 3           3   1020    802
## 4           4   1094    834
## 5           5   1169    919
## 6           6   1435   1156

Now, let’s trim per condition per participant:

# trim the data
trimmedData <- sdTrim(data = exampleData, minRT = 150, sd = 3, 
                      perCondition = TRUE, perParticipant = TRUE, 
                      returnType = "mean", digits = 0)

# look at the top of the data
##   participant Switch Repeat
## 1           1   1099    742
## 2           2   1136   1038
## 3           3   1028    802
## 4           4   1103    834
## 5           5   1184    916
## 6           6   1461   1136

Recursive / Moving Criterion

Three functions in this family implement the trimming methods proposed & discussed by van Selst & Jolicoeur (1994): nonRecursive, modifiedRecursive, and hybridRecursive. van Selst & Jolicoeur noted that the outcome of many trimming methods is influenced by the sample size (i.e., the number of trials) being considered, thus potentially producing bias. For example, even if RTs are drawn from identical positively-skewed distributions, a “per condition per participant” SD procedure (see sdTrim above) would result in a higher mean estimate for small sample sizes than larger sample sizes. This bias was shown to be removed when a “moving criterion” (MC) was used; this is where the SD used for trimming is adapted to the sample size being considered.


The non-recursive method proposed by van Selst & Jolicoeur (1994) is very similar to the standard deviation method outlined above with the exception that the user does not specify the SD to use as the upper bound. The SD used for the upper bound is rather decided by the sample size of the RTs being passed to the trimming function, with larger SDs being used for larger sample sizes. Also, the function only trims per participant per condition.

The nonRecursive function checks the sample size of the data being passed to it, and looks up the SD criterion required for the data’s sample size. The function looks in a data file contained in trimr called linearInterpolation. Should the user wish to see this data file (although the user will never need to access it if they are not interested), type:

# load the data

# show the first 20 rows (there are 100 in total)
linearInterpolation[1:20, ]
##    sampleSize nonRecursive modifiedRecursive
## 1           1        1.458             8.000
## 2           2        1.680             6.200
## 3           3        1.841             5.300
## 4           4        1.961             4.800
## 5           5        2.050             4.475
## 6           6        2.120             4.250
## 7           7        2.173             4.110
## 8           8        2.220             4.000
## 9           9        2.246             3.920
## 10         10        2.274             3.850
## 11         11        2.310             3.800
## 12         12        2.326             3.750
## 13         13        2.334             3.736
## 14         14        2.342             3.723
## 15         15        2.350             3.709
## 16         16        2.359             3.700
## 17         17        2.367             3.681
## 18         18        2.375             3.668
## 19         19        2.383             3.654
## 20         20        2.391             3.640

Notice there are two columns. This current function will only look in the nonRecursive column; the other column is used by the modifiedRecursive function, discussed below. If the sample size of the current set of data is 16 RTs (for example), the function will use an upper SD criterion of 2.359, and will proceed much like the sdTrim function’s operations.

Note the user can only be returned the mean trimmed RTs (i.e., there is no “returnType” argument for this function).

# trim the data
trimmedData <- nonRecursive(data = exampleData, minRT = 150, 
                            digits = 0)

# see the top of the data
##   participant Switch Repeat
## 1           1   1053    732
## 2           2   1131   1026
## 3           3   1017    799
## 4           4   1089    818
## 5           5   1169    908
## 6           6   1435   1123


The modifiedRecursive function is more involved than the nonRecursive function. This function performs trimming in cycles. It first temporarily removes the slowest RT from the distribution; then, the mean of the sample is calculated, and the cut-off value is calculated using a certain number of SDs around the mean, with the value for SD being determined by the current sample size. In this procedure, required SD decreases with increased sample size (cf., the nonRecursive method, with increasing SDs with increasing sample size; see the linearInterpolation data file above); see Van Selst and Jolicoeur (1994) for justification.

The temporarily removed RT is then returned to the sample, and the fastest and slowest RTs are then compared to the cut-off, and removed if they fall outside. This process is then repeated until no outliers remain, or until the sample size drops below four. The SD used for the cut-off is thus dynamically altered based on the sample size of each cycle of the procedure, rather than static like the nonRecursive method.

# trim the data
trimmedData <- modifiedRecursive(data = exampleData, minRT = 150, 
                                 digits = 0)

# see the top of the data
##   participant Switch Repeat
## 1           1    792    691
## 2           2   1036    927
## 3           3    958    716
## 4           4   1000    712
## 5           5   1107    827
## 6           6   1309   1049


van Selst and Jolicoeur (1994) reported slight opposing trends of the non-recursive and modified-recursive trimming methods (see page 648, footnote 2). They therefore, in passing, suggested a “hybrid-recursive” method might balance the opposing trends. The hybrid-recursive method simply takes the average of the non-recursive and the modified-recursive methods.

# trim the data
trimmedData <- hybridRecursive(data = exampleData, minRT = 150, 
                               digits = 0)

# see the top of the data
##   participant Switch Repeat
## 1           1    923    711
## 2           2   1083    976
## 3           3    987    757
## 4           4   1044    765
## 5           5   1138    867
## 6           6   1372   1086

Data from Factorial Designs

In the example data that ships with trimr, the RT data comes from just two conditions (Switch vs. Repeat), which are coded in the column “condition”. However, in experimental psychology, factorial designs are prevalent, where RT data comes from more than one independent variable, with each IV having multiple levels. How can trimr deal with this format?

First, let’s re-shape the exampleData set to how data might be stored from a factorial design. Let there be two IVs, each with two levels:

  1. taskSequence: Switch vs. Repeat
  2. reward: Reward vs. NoReward

The taskSequence factor is coding whether the task has Switched or Repeated from the task on the previous trial (as before). The reward factor is coding whether the participant was presented with a reward or not on the current trial (presented randomly). Let’s reshape our data frame to match this fictitious experimental scenario:

# get the example data that ships with trimr

# pass it to a new variable
newData <- exampleData

# add a column called "taskSequence" 
newData$taskSequence <- newData$condition

# add a column called "reward" 
# Fill it with random entries, just for example. 
# This uses R's "sample" function
newData$reward <- sample(c("Reward", "NoReward"), nrow(newData), 
                         replace = TRUE)

# delete the "condition" column
newData <- subset(newData, select = -condition)

# now let's look at our new data
##   participant   rt accuracy taskSequence   reward
## 1           1 1660        1       Switch   Reward
## 2           1  913        1       Switch NoReward
## 3           1 2312        1       Repeat NoReward
## 4           1  754        1       Repeat   Reward
## 5           1 3394        1       Switch   Reward
## 6           1  930        1       Repeat   Reward

This now looks how data typically comes in from a factorial design. Now, to get trimr to work on this, we need to create a new column called “condition”, and to place in this column the levels of all factors in the design. For example, if the first trial in our newData has taskSequence = Switch and reward = NoReward, we would like our condition entry for this trial to read “Switch_NoReward”. This is simple to do using R’s “paste” function. (Note that this code can be adapted to deal with any number of factors.)

# add a new column called "condition".
# Fill it with information our factors
newData$condition <- paste(newData$taskSequence, "_", newData$reward, sep = "")

# look at the data
##   participant   rt accuracy taskSequence   reward       condition
## 1           1 1660        1       Switch   Reward   Switch_Reward
## 2           1  913        1       Switch NoReward Switch_NoReward
## 3           1 2312        1       Repeat NoReward Repeat_NoReward
## 4           1  754        1       Repeat   Reward   Repeat_Reward
## 5           1 3394        1       Switch   Reward   Switch_Reward
## 6           1  930        1       Repeat   Reward   Repeat_Reward

Now we can pass this data frame to trimr, and it will work perfectly.

# trim the data
trimmedData <- sdTrim(newData, minRT = 150, sd = 2.5)

# check it worked
# (remove participant column so it fits in blog
# window)
head(trimmedData[, -1])
##   Switch_Reward Switch_NoReward Repeat_NoReward Repeat_Reward
## 1      1053.348        1054.317         702.924       769.549
## 2      1147.131        1106.321        1009.798      1030.902
## 3       998.140        1030.262         791.779       799.584
## 4      1105.244        1073.707         835.189       807.512
## 5      1191.217        1148.509         902.888       903.907
## 6      1483.667        1398.000        1087.029      1160.127


Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R package version 1.0.1.

Van Selst, M. & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. Quarterly Journal of Experimental Psychology, 47 (A), 631–650.

flankr: Modelling Eriksen Flanker Task Performance

Do you do research using the Eriksen flanker task? Want to engage in some computational modelling of your data?

I’ve recently had a paper published in Behavior Research Methods reporting an R package that allows users to fit two recent computational models of performance in the Eriksen flanker task: the Dual-Stage Two-Phase (DSTP) model of Huebner et al. (2010), and the Shrinking Spotlight (SSP) model of White et al. (2011). The paper can be found here, or if you don’t have access to this journal, I have a pre-print of the article available here (but it doesn’t look as nice as the final journal version!). 



I have tried to make the paper as accessible as possible to researchers with no programming experience and no experience in computational modelling. To help with this, I have made a short (ish!) video discussing the main elements of the DSTP and the SSP models, as well as a brief walkthrough of how to use flankr. This will primarily be used for my students, but I thought others might find it of use, too.

If you have any questions/problems about the package then please don’t hesitate to contact me! The script I use in the video can be downloaded here.