Over the past year I have become increasingly interested in using models of simple decision making processes, as measured by two-choice response time tasks. A simple 2-choice task might ask participants to judge whether a presented number is odd or even, for example. One popular model of 2-choice RT is the Ratcliff Diffusion Model. A simplified example of this model is presented below.

The idea behind the model is that, once presented, the stimulus elicits an evidence-accumulation process. That is, the cognitive system begins sampling the stimulus, and evidence begins to accumulate in a noisy manner towards one of two response boundaries. Once a boundary has been reached, that response is executed. One boundary reflects the correct response, and the other reflects the incorrect response. The average rate of evidence accumulation (which is a drift diffusion process) is reflected in a parameter **drift rate**. The amount of evidence that needs to accumulate before a response is made is governed by the distance between the two boundaries (**boundary separation **parameter). The height of these boundaries are symmetrical around a starting point of the drift diffusion process, reflected in a parameter **bias**. The bias is typically midway between the two boundaries, as (for example) 50% of trials will present odd stimuli, so the participant will have no reason to prefer one response over the other. The final parameter is called **non-decision time**, which reflects the time it takes to encode the stimulus and execute the motor response. The model is very complex mathematically and computationally, but implementing the model has been made easier over recent years due to the release of several software packages. Among these are E-J Wagenmakers and colleagues’ EZ diffusion model, Vandekerckhove & Tuerlinckx’s DMAT toolbox for MatLab, and Voss and Voss’ Fast-DM. For an excellent overview of the diffusion model, as well as available packages, see this paper by E-J Wagenmakers.

However, a new package (the “RWiener” package) has just been published by Wabersich and Vandekerckhove (see paper here) for R-Statistics. R is my favourite thing EVER (seriously), so I was thrilled to see an R package that can model response times.

## How many trials?

The package is very straightforward; it can generate synthetic data with known parameters as well as find best-fitting parameters from data you might have, which is superb. One question I was unsure of, though, is how accurate this parameter estimation routine is for varying number of trials in an experiment. That is, **how many trials do I need as a minimum in my experiment for this package to produce accurate parameter estimation?** With human data, this is a very difficult (impossible?) question to answer, but is tractable when fitting the model to artificially-generated data; that is, because we know which parameters gave rise to the data (because we tell the computer which to use), we can tell whether the estimated parameters are accurate (compare estimated parameters to “known” parameters).

I set up a simulation where I addressed the “how many trials?” question. The structure is straightforward:

- Decide on some “known” parameters
- Generate N trials with the known parameter for X number of simulated “experiments”
- Fit the model to each simulated experiment
- Compare estimated parameters to known parameters
- Repeat for various values of N

For the simulations reported, I simulated experiments which had Ns of 25, 50, 100, 250, 500, and 1000 trials. This spans from the ridiculously small experiments (N=25) to the ridiculously large experiments (N=1000). For each N, I conducted 100 simulated experiments. Then, I produced boxplots for each parameter which allowed me to compare the estimated parameters to the known parameters (which is shown as a horizontal line on each plot). Ns which recover the known parameters well will have boxplots which cluster closely to the horizontal line for each parameter.

## Conclusion

The fitting routine is able to produce reasonably accurate parameters estimates with as little as 250 trials; beyond this, there is not much advantage to increasing trial numbers. Note that 100 trials is still OK, but more will produce better estimates. There is no systematic bias in these estimates with low trial numbers except for the non-decision parameter; estimates seemed to be too high with lower trial numbers, and this increased bias decreased as trial numbers are boosted.

The RWiener package is a welcome addition for fitting models to response time data, producing reasonably accurate parameter estimates with as little as ~250 trials.

### R Code for above simulation

The code for the simulation is below. (Note, this takes quite a while to run, as my code is likely very un-elegant.)

rm(list = ls()) library(RWiener) #how many trials in each sample condition? nTrials <- c(25, 50, 100, 250, 500, 1000) #how many simulations for each sample? nSims <- 100 #known parameters parms <- c(2.00, #boundary separation 0.30, #non-decision time 0.50, #initial bias 0.50) #drift rate #what are the starting parmaeters for the parameter minimisation routine? startParms <- c(1, 0.1, 0.1, 1) #boundary, non-decision, initial bias, & drift #start simulations here for(i in 1:length(nTrials)){ #for each number of trials (25, 50, etc...) #get the current number of trials to generate tempTrials <- nTrials[i] #generate empty matrix to store results tempMatrix <- matrix(0, nrow=nSims, ncol=length(parms)) for(j in 1:nSims){ #do a loop over total number of simulations required #generate synthetic data tempData <- rwiener(n=tempTrials, alpha=parms[1], tau=parms[2], beta=parms[3], delta=parms[4]) #now fit the model to this synthetic data findParms <- optim(startParms, wiener_deviance, dat=tempData, method="Nelder-Mead") #store the best-fitting parameters tempMatrix[j, ] <- findParms$par } #store the matrix of results as a unique matrix assign(paste("data", tempTrials, sep = ""), tempMatrix) } #do the plotting par(mfrow=c(2,2)) #change layout to 2x2 #plot boundary boxplot(data25[, 1], data50[, 1], data100[, 1], data250[, 1], data500[, 1], data1000[, 1], names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Boundary Separation") abline(a=parms[1], b=0) #plot non-decision time boxplot(data25[, 2], data50[, 2], data100[, 2], data250[, 2], data500[, 2], data1000[, 2], names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Non-Decision Time") abline(a=parms[2], b=0) #plot initial bias boxplot(data25[, 3], data50[, 3], data100[, 3], data250[, 3], data500[, 3], data1000[, 3], names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Initial Bias") abline(a=parms[3], b=0) #plot drift rate boxplot(data25[, 4], data50[, 4], data100[, 4], data250[, 4], data500[, 4], data1000[, 4], names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Drift Rate") abline(a=parms[4], b=0)