Response times

trimr: An R Package of Response Time Trimming Methods

Response times are a dominant dependent variable in cognitive psychology (and other areas). It is common that raw response times need to undergo some trimming before being submitted to inferential analysis; this is because RTs typically suffer from outliers: a small proportions of RTs that lie at the extremes of the response time distribution and are thought to arise from processes not under investigation.

There are a wide array of response time trimming methods that I outlined in a previous post. Some are very simple to implement (such as removing all RTs slower than 2 seconds, for example). Some are intermediate in terms of difficulty of implementation (such as removing all RTs slower than 2.5 standard deviations above the mean of each participant of each condition). Some are downright tricky to implement, such as the modified recursive procedure of Van Selst & Jolicoeur (1994).

To make response time trimming simpler for the researcher, I have developed a small R package—trimr—that takes raw RTs for all participants and experimental conditions, performs any trimming method the user requires, and returns data for all participants and conditions ready for inferential testing.

Below I provide an overview of how to use trimr.

Overview

trimr is an R package that implements most commonly-used response time trimming methods, allowing the user to go from a raw data file to a finalised data file ready for inferential statistical analysis.

trimr is available from CRAN. To download it, open R and type:

install.packages("trimr")

To install the latest version of trimr (i.e., the development version of next release), install devtools, and install directly from GitHub by using:

# install devtools
install.packages("devtools")

# install trimr from GitHub
devools::install_github("JimGrange/trimr")

(To report any bugs in trimr—which are likely—please see my GitHub account for trimr, and click on “issues” in the top right.)

The trimming functions fall broadly into three families (together with the function names for each method implemented in trimr):

  1. Absolute Value Criterion:
    • absoluteRT
  2. Standard Deviation Criterion:
    • sdTrim
  3. Recursive / Moving Criterion:
    • nonRecursive
    • modifiedRecursive
    • hybridRecursive

Example Data

trimr ships with some example data—“exampleData”—that the user can explore the trimming functions with. This data is simulated (i.e., not real), and has data from 32 subjects. This data is from a task switching experiment, where RT and accuracy was recorded for two experimental conditions: Switch, when the task switched from the previous trial, and Repeat, when the task repeated from the previous trial.

# load the trimr package
library(trimr)

# activate the data
data(exampleData)

# look at the top of the data
head(exampleData)
##   participant condition   rt accuracy
## 1           1    Switch 1660        1
## 2           1    Switch  913        1
## 3           1    Repeat 2312        1
## 4           1    Repeat  754        1
## 5           1    Switch 3394        1
## 6           1    Repeat  930        1

The exampleData consists of 4 columns:

  • participant: Codes the number of each participant in the experiment
  • condition: In this example, there are two experimental conditions: “Switch”, and “Repeat”.
  • rt: Logs the response time of the participant in milliseconds.
  • accuracy: Logs the accuracy of the response. 1 codes a correct response, 0 an error response.

At a minimum, users using their own data need columns with these names in their data frame they are using trimr for. The user can use RTs logged in milliseconds (as here) or in seconds (e.g., 0.657). The user can control the number of decimal places to round the trimmed data to.


Absolute Value Criterion

The absolute value criterion is the simplest of all of the trimming methods available (except of course for having no trimming). An upper- and lower-criterion is set, and any response time that falls outside of these limits are removed. The function that performs this trimming method in trimr is called absoluteRT.

absoluteRT

In this function, the user decalares lower- and upper-criterion for RT trimming (minRT and maxRT arguments, respectively); RTs outside of these criteria are removed. Note that these criteria must be in the same unit as the RTs are logged in within the data frame being used. The function also has some other important arguments:

  • omitErrors: If the user wishes error trials to be removed from the trimming, this needs to be set to TRUE (it is set to this by default). Alternatively, some users may wish to keep error trials included. Therefore, set this argument to FALSE.
  • returnType: Here, the user can control how the data are returned. “raw” returns trial-level data after the trials with trimmed RTs are removed; “mean” returns calculated mean RT per participant per condition after trimming; “median” returns calculated median RT per participant per condition after trimming. This is set to “mean” by default.
  • digits: How many digits to round the data to after trimming? If the user has a data frame where the RTs are recorded in seconds (e.g., 0.657), this argument can be left at its default value of 3. However, if the data are logged in milliseconds, it might be best to change this argument to zero, so there are no decimal places in the rounding of RTs (e.g., 657).

In this first example, let’s trim the data using criteria of RTs less than 150 milliseconds and greater than 2,000 milliseconds, with error trials removed before trimming commences. Let’s also return the mean RTs for each condition, and round the data to zero decimal places.

# perform the trimming
trimmedData <- absoluteRT(data = exampleData, minRT = 150, 
                          maxRT = 2000, digits = 0)

# look at the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1    901    742
## 2           2   1064    999
## 3           3   1007    802
## 4           4   1000    818
## 5           5   1131    916
## 6           6   1259   1067

Note that trimr returns a data frame with each row representing each participant in the data file (logged in the participant column), and separate columns for each experimental condition in the data.

If the user wishes to recive back trial-level data, change the “returnType” argument to “raw”:

# perform the trimming
trimmedData <- absoluteRT(data = exampleData, minRT = 150, 
                          maxRT = 2000, returnType = "raw", 
                          digits = 0)

# look at the top of the data
head(trimmedData)
##    participant condition   rt accuracy
## 1            1    Switch 1660        1
## 2            1    Switch  913        1
## 4            1    Repeat  754        1
## 6            1    Repeat  930        1
## 7            1    Switch 1092        1
## 11           1    Repeat  708        1

Now, the data frame returned is in the same shape as the initial data file, but rows containing trimmed RTs are removed.


Standard Deviation Criterion

This trimming method uses a standard deviation multiplier as the upper criterion for RT removal (users still need to enter a lower-bound manually). For example, this method can be used to trim all RTs 2.5 standard deviations above the mean RT. This trimming can be done per condition (e.g., 2.5 SDs above the mean of each condition), per participant (e.g., 2.5 SDs above the mean of each participant), or per condition per participant (e.g., 2.5 SDs above the mean of each participant for each condition).

sdTrim

In this function, the user delcares a lower-bound on RT trimming (e.g., 150 milliseconds) and an upper-bound in standard deviations. The value of standard deviation used is set by the SD argument. How this is used varies depending on the values the user passes to two important function arguments:

  • perCondition: If set to TRUE, the trimming will occur above the mean of each experimental condition in the data file.
  • perParticipant: If set to TRUE, the trimming will occur above the mean of each participant in the data file.

Note that if both are set to TRUE, the trimming will occur per participant per condition (e.g., if SD is set to 2.5, the function will trim RTs 2.5 SDs above the mean RT of each participant for each condition).

In this example, let’s trim RTs faster than 150 milliseconds, and greater than 3 SDs above the mean of each participant, and return the mean RTs:

# trim the data
trimmedData <- sdTrim(data = exampleData, minRT = 150, sd = 3, 
                      perCondition = FALSE, perParticipant = TRUE, 
                      returnType = "mean", digits = 0)

# look at the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1   1042    775
## 2           2   1136   1052
## 3           3   1020    802
## 4           4   1094    834
## 5           5   1169    919
## 6           6   1435   1156

Now, let’s trim per condition per participant:

# trim the data
trimmedData <- sdTrim(data = exampleData, minRT = 150, sd = 3, 
                      perCondition = TRUE, perParticipant = TRUE, 
                      returnType = "mean", digits = 0)

# look at the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1   1099    742
## 2           2   1136   1038
## 3           3   1028    802
## 4           4   1103    834
## 5           5   1184    916
## 6           6   1461   1136

Recursive / Moving Criterion

Three functions in this family implement the trimming methods proposed & discussed by van Selst & Jolicoeur (1994): nonRecursive, modifiedRecursive, and hybridRecursive. van Selst & Jolicoeur noted that the outcome of many trimming methods is influenced by the sample size (i.e., the number of trials) being considered, thus potentially producing bias. For example, even if RTs are drawn from identical positively-skewed distributions, a “per condition per participant” SD procedure (see sdTrim above) would result in a higher mean estimate for small sample sizes than larger sample sizes. This bias was shown to be removed when a “moving criterion” (MC) was used; this is where the SD used for trimming is adapted to the sample size being considered.

nonRecursive

The non-recursive method proposed by van Selst & Jolicoeur (1994) is very similar to the standard deviation method outlined above with the exception that the user does not specify the SD to use as the upper bound. The SD used for the upper bound is rather decided by the sample size of the RTs being passed to the trimming function, with larger SDs being used for larger sample sizes. Also, the function only trims per participant per condition.

The nonRecursive function checks the sample size of the data being passed to it, and looks up the SD criterion required for the data’s sample size. The function looks in a data file contained in trimr called linearInterpolation. Should the user wish to see this data file (although the user will never need to access it if they are not interested), type:

# load the data
data(linearInterpolation)

# show the first 20 rows (there are 100 in total)
linearInterpolation[1:20, ]
##    sampleSize nonRecursive modifiedRecursive
## 1           1        1.458             8.000
## 2           2        1.680             6.200
## 3           3        1.841             5.300
## 4           4        1.961             4.800
## 5           5        2.050             4.475
## 6           6        2.120             4.250
## 7           7        2.173             4.110
## 8           8        2.220             4.000
## 9           9        2.246             3.920
## 10         10        2.274             3.850
## 11         11        2.310             3.800
## 12         12        2.326             3.750
## 13         13        2.334             3.736
## 14         14        2.342             3.723
## 15         15        2.350             3.709
## 16         16        2.359             3.700
## 17         17        2.367             3.681
## 18         18        2.375             3.668
## 19         19        2.383             3.654
## 20         20        2.391             3.640

Notice there are two columns. This current function will only look in the nonRecursive column; the other column is used by the modifiedRecursive function, discussed below. If the sample size of the current set of data is 16 RTs (for example), the function will use an upper SD criterion of 2.359, and will proceed much like the sdTrim function’s operations.

Note the user can only be returned the mean trimmed RTs (i.e., there is no “returnType” argument for this function).

# trim the data
trimmedData <- nonRecursive(data = exampleData, minRT = 150, 
                            digits = 0)

# see the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1   1053    732
## 2           2   1131   1026
## 3           3   1017    799
## 4           4   1089    818
## 5           5   1169    908
## 6           6   1435   1123

modifiedRecursive

The modifiedRecursive function is more involved than the nonRecursive function. This function performs trimming in cycles. It first temporarily removes the slowest RT from the distribution; then, the mean of the sample is calculated, and the cut-off value is calculated using a certain number of SDs around the mean, with the value for SD being determined by the current sample size. In this procedure, required SD decreases with increased sample size (cf., the nonRecursive method, with increasing SDs with increasing sample size; see the linearInterpolation data file above); see Van Selst and Jolicoeur (1994) for justification.

The temporarily removed RT is then returned to the sample, and the fastest and slowest RTs are then compared to the cut-off, and removed if they fall outside. This process is then repeated until no outliers remain, or until the sample size drops below four. The SD used for the cut-off is thus dynamically altered based on the sample size of each cycle of the procedure, rather than static like the nonRecursive method.

# trim the data
trimmedData <- modifiedRecursive(data = exampleData, minRT = 150, 
                                 digits = 0)

# see the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1    792    691
## 2           2   1036    927
## 3           3    958    716
## 4           4   1000    712
## 5           5   1107    827
## 6           6   1309   1049

hybridRecursive

van Selst and Jolicoeur (1994) reported slight opposing trends of the non-recursive and modified-recursive trimming methods (see page 648, footnote 2). They therefore, in passing, suggested a “hybrid-recursive” method might balance the opposing trends. The hybrid-recursive method simply takes the average of the non-recursive and the modified-recursive methods.

# trim the data
trimmedData <- hybridRecursive(data = exampleData, minRT = 150, 
                               digits = 0)

# see the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1    923    711
## 2           2   1083    976
## 3           3    987    757
## 4           4   1044    765
## 5           5   1138    867
## 6           6   1372   1086

Data from Factorial Designs

In the example data that ships with trimr, the RT data comes from just two conditions (Switch vs. Repeat), which are coded in the column “condition”. However, in experimental psychology, factorial designs are prevalent, where RT data comes from more than one independent variable, with each IV having multiple levels. How can trimr deal with this format?

First, let’s re-shape the exampleData set to how data might be stored from a factorial design. Let there be two IVs, each with two levels:

  1. taskSequence: Switch vs. Repeat
  2. reward: Reward vs. NoReward

The taskSequence factor is coding whether the task has Switched or Repeated from the task on the previous trial (as before). The reward factor is coding whether the participant was presented with a reward or not on the current trial (presented randomly). Let’s reshape our data frame to match this fictitious experimental scenario:

# get the example data that ships with trimr
data(exampleData)

# pass it to a new variable
newData <- exampleData

# add a column called "taskSequence" 
newData$taskSequence <- newData$condition

# add a column called "reward" 
# Fill it with random entries, just for example. 
# This uses R's "sample" function
newData$reward <- sample(c("Reward", "NoReward"), nrow(newData), 
                         replace = TRUE)

# delete the "condition" column
newData <- subset(newData, select = -condition)

# now let's look at our new data
head(newData)
##   participant   rt accuracy taskSequence   reward
## 1           1 1660        1       Switch   Reward
## 2           1  913        1       Switch NoReward
## 3           1 2312        1       Repeat NoReward
## 4           1  754        1       Repeat   Reward
## 5           1 3394        1       Switch   Reward
## 6           1  930        1       Repeat   Reward

This now looks how data typically comes in from a factorial design. Now, to get trimr to work on this, we need to create a new column called “condition”, and to place in this column the levels of all factors in the design. For example, if the first trial in our newData has taskSequence = Switch and reward = NoReward, we would like our condition entry for this trial to read “Switch_NoReward”. This is simple to do using R’s “paste” function. (Note that this code can be adapted to deal with any number of factors.)

# add a new column called "condition".
# Fill it with information our factors
newData$condition <- paste(newData$taskSequence, "_", newData$reward, sep = "")

# look at the data
head(newData)
##   participant   rt accuracy taskSequence   reward       condition
## 1           1 1660        1       Switch   Reward   Switch_Reward
## 2           1  913        1       Switch NoReward Switch_NoReward
## 3           1 2312        1       Repeat NoReward Repeat_NoReward
## 4           1  754        1       Repeat   Reward   Repeat_Reward
## 5           1 3394        1       Switch   Reward   Switch_Reward
## 6           1  930        1       Repeat   Reward   Repeat_Reward

Now we can pass this data frame to trimr, and it will work perfectly.

# trim the data
trimmedData <- sdTrim(newData, minRT = 150, sd = 2.5)

# check it worked
# (remove participant column so it fits in blog
# window)
head(trimmedData[, -1])
##   Switch_Reward Switch_NoReward Repeat_NoReward Repeat_Reward
## 1      1053.348        1054.317         702.924       769.549
## 2      1147.131        1106.321        1009.798      1030.902
## 3       998.140        1030.262         791.779       799.584
## 4      1105.244        1073.707         835.189       807.512
## 5      1191.217        1148.509         902.888       903.907
## 6      1483.667        1398.000        1087.029      1160.127

References

Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R package version 1.0.1.

Van Selst, M. & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. Quarterly Journal of Experimental Psychology, 47 (A), 631–650.

Advertisements

On the diversity of response time trimming methods

Below I outline an argument for moving towards a clearer, more objective, way to trim response times. I first discuss the importance of response time trimming, and then outline various methods commonly used by researchers. I then quantify the diversity of these methods by reviewing 3 years of articles from two prominent cognitive psychology journals, and catalogue usage of each method. I then suggest that a technique introduced by Van Selst and Jolicoeur (1994) might—and here I stress might—be a solution to the lack of objectivity in choosing which method to use. To aid its use, I provide R scripts for the trimming method by Van Selst and Jolicoeur. 

I don’t usually intend to write posts as long as this, but the text below represents a small comment paper I have been trying—unsuccessfully—to publish. Rather than it sit in my drawer, I thought I would share it here. Comments welcomed. 

Overview

Response times (RT) are an incredibly popular dependent variable in cognitive psychology, whereby researchers typically take a measure of central tendency of the distribution of total RTs for a given condition (often the mean, but sometimes the median) to infer the time-course of discrete psychological processes. The challenge facing researchers is how to best deal with so-called outliers: A small proportion of RTs that lie at the extremes of the RT distribution and thought to arise from processes not under investigation. These outliers can occur at the slow end of the distribution (e.g. due to distraction, lack of alertness etc.) or at the faster end (e.g. a rapid anticipatory response). As these outliers can influence the estimate of central tendency—and hence contaminate the estimate of the psychological process under investigation—researchers typically remove outliers (“trimming”) before conducting inferential analysis. 

But what method should be used to identify outliers? This question turns out to be very challenging to answer (with no necessarily correct answer); as such, there exists a vast and diverse range of methods typically employed. Some statisticians in fact recommend not trimming RT data at all (see e.g. Ullrich & Miller, 1994). Alternatives include taking the median (which is less affected by extreme scores than the arithmetic mean), fitting a model to the entire RT distribution (Heathcote, Popiel, & Mewhort, 1991), analysing cumulative distribution frequencies (Ratcliff, 1979; see Houghton & Grange, 2011), or applying one of a class of process models of response time (e.g. Wagenmakers, 2009).

However, if RT trimming is to be used for calculation of mean RT, it is desirable that the method employed is as objective as possible.  The inconsistency of possible methods—at best—leaves researchers in a quandary how best to process their data; at worst, it increases researcher degrees of freedom (Simmons, Nelson, & Simonsohn, 2011): The increased flexibility in choosing which RT trimming method to use might increase false-positive rates. 

The purpose of this post is to highlight the diversity in methods so that researchers are cognizant of the issue; I also propose that researchers consider establishing a standardised method of response time trimming using objective criteria. My issue is not with the trimming methods per se, but rather the potential for a lack of objectivity in selecting which method to use. In this day of increasing concern of replicability in psychological science, it is imperative to start the discussion regarding a uniform method of RT trimming. One candidate method was introduced by Van Selst and Jolicoeur (1994), but it is relatively complicated to implement. To facilitate its use, researchers have provided routines in proprietary software (e.g. SPSS; Thompson, 2006); in addition, I provide routines in the statistical package R (R Development Core Team, 2012) that implements this method.

Quantifying the Diversity

In an attempt to informally quantify the diversity of trimming methods employed, Table 1 catalogues the frequency of a number of diverse trimming methods reported in the 2010–2012 volumes of Journal of Experimental Psychology: Learning, Memory, & Cognition, and Quarterly Journal of Experimental Psychology. What strikes me is the sheer number of trimming-options researchers have to choose from. 

No trimming is where no clear report of a trimming method could be found in the article. This, of course, does not necessarily mean that trimming was not employed, so is likely an over-estimate of the true number of studies that did not employ trimming. Absolute cut-off involves identifying an absolute upper- and lower-limit on RTs to include in the final analysis (e.g. “RTs faster than 200ms and slower than 2,000ms were excluded from data analysis”). Standard deviation (SD) trimming comes in different guises: Global SD trim removes any RTs that fall outside of a certain number of SDs from the global mean (i.e. across all participants and conditions; e.g. “RTs slower than 2.5 SD of the mean were excluded”); per cell SD trimming removes RTs outside of a certain number of SDs from the global mean of each experimental cell (“RTs slower than 2.5 SD of the mean of each experimental condition were excluded”); per participant trims RTs outside of certain number of SDs from the mean of each participant’s overall RT (“RTs slower than 2.5SD of the mean of each participant were excluded”); per cell, per participant is arguably more fair, as it trims RTs from all participants for all conditions, and hence will certainly trim from all experimental conditions (e.g. “RTs slower than 2.5 SD of the mean of each participant for each condition were excluded”). 

 

Table 1

Lack of Objectivity?

The main issue with all of the above trimming methods is their potential lack of objectivity; for example, when using an absolute cut-off, what criteria should one use for deciding on the upper limit? The choice from the articles reviewed showed that the choice ranged from 800 milliseconds (ms) to 10,000ms. Obviously, the choice will be influenced by the difficulty of the task, but even with a relatively simple task, how does one choose whether to use 2,000ms or 2,500ms as the upper limit? The lower limit is potentially simpler, as it defines the value below which responses were likely anticipatory (i.e. unrealistically fast); but even in this simpler case, there was a wide range of limits used, ranging from 50ms to 400ms. As such, a popular alternative to the absolute cut-off is to allow the data itself to identify outliers, by removing RTs above a certain number of standard deviations (SDs) above the mean. However, this process too might suffer from a lack of objectivity, as how does one decide on the SD value to use (2.5SDs or 3SDs?). In the articles reviewed, the SD chosen for the trimming ranged from 2 to 4.

 

A Potential Solution?

A strong candidate for an objective response time trimming method was introduced by Van Selst and Joliceur (1994). They noted that the outcome of many trimming methods is influenced by the sample size (i.e. number of trials) being considered, thus potentially producing bias. For example, even if RTs are drawn from identical positively-skewed distributions, a “per cell per participant” SD procedure would result in a higher mean estimate for a small sample size “condition” than a large sample size condition. This bias was shown to be removed when a “moving criterion” (MC) was used; this is where the SD used for trimming is dynamically adapted to the sample size being considered. This meets the criteria for objectivity in a trimming method as the SD used for calculating cut-off values is not determined by the researcher, but by the sample size under investigation. Thus, this method is an excellent candidate for a standardised trimming procedure.

Van Selst and Jolicoeur (1994) introduced two MC methods that reduced the bias with sample size: The non-recursive (MC) method removes any RT that falls outside a certain number of SDs from the mean (of the whole sample) being considered, with the value of SD being determined by the sample size of the distribution, with a lower SD value being used for smaller sample sizes (see Table 4 in Van Selst & Jolicoeur). The modified recursive (MC) procedure performs trimming in cycles. It first temporarily removes the slowest RT from the distribution; then, the mean of the sample is calculated, and the cut-off value is again calculated using a certain number of SDs around the mean, with the value for SD being determined by sample size (in this procedure, required SD decreases with increased sample size; see Van Selst & Jolicoeur for justification). The temporarily removed RT is then returned to the sample, and the fastest and slowest RTs are then compared to the cut-off, and removed if they fall outside. This process is then repeated until no outliers remain, or until the sample size drops below four. The SD used for the cut-off is thus dynamically altered based on the sample size of each cycle of the procedure. Van Selst and Jolicoeur reported slight opposing trends of these two methods, suggesting a “hybrid moving criterion” method (see their footnote 2, page 648) which simply takes the average of the non-recursive (MC) and modified recursive (MC) procedures. 

Although the non-recursive (MC) procedure is relatively simple to implement with equal sample sizes between conditions and participants in standard software such as Excel, the modified recursive (MC) procedure and the hybrid present some technical challenges. Specifically, the modified recursive procedure requires many cycles of removing individual RTs, calculating means, establishing a dynamic SD criterion based on the current sample size on the current cycle, replacement of RTs, and trimming; the procedure must also be aware of the stopping rule when the sample drops below four. 

Of course, the Van Selst and Jolicoeur (1994) method is just one possible approach, and the field might not reach a consensus as to which method could become the standard (or might not even want a consensus). As such, the field might continue (quite understandably) to use any one of a number of methods; but at the very least, I recommend that researchers should justify explicitly why they chose the method of RT trimming they did, and potentially demonstrate whether the pattern of results changes depending on the method employed. Such disclosure will allow readers to assess to what degree the results presented might be reliant on the trimming method chosen.

 

R Script for Implementing Van Selst & Jolicoeur (1994)

To facilitate implementation of this method for researchers, I provide routines in the statistical package R; this set of scripts is capable of executing all three of the methods recommended by Van Selst and Joliceour (1994), and also includes a “quick start guide” for users unfamiliar with R. The scripts can be downloaded from Github.

 

References

Heathcote, A., Popiel, S.J., & Mewhort, D.J.K. (1991). Analysis of response time distributions—An example using the Stroop task. Psychological Bulletin, 109, 340-347.

Houghton, G. & Grange, J.A. (2011). CDF-XL: Computing cumulative distribution frequencies of reaction time data in Excel. Behavior Research Methods, 43, 1023-1032.

Ratcliff, R. (1979). Group reaction time distributions and an analysis of distribution statistics. Psychological Bulletin, 86, 446-461.

R Development Core Team. (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/.

Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359-1366.

Thompson, G.L. (2006). An SPSS implementation of the non-recursive outlier detection procedure with shifting z-score criterion (Van Selst & Jolicoeur, 1994). Behavior Research Methods, 38, 344-352.

Ulrich, R. & Miller, J. (1994). Effects of truncation on reaction time analysis. Journal of Experimental Psychology: General, 123, 34-80.

Van Selst, M. & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. Quarterly Journal of Experimental Psychology, 47 (A), 631-650.

Wagenmalers, E.-J. (2009). Methodological and empirical developments for the Ratcliff diffusion model of response times and accuracy. European Journal of Cognitive Psychology, 21,641-671. 

Response Time Modelling – How Many Trials?

Over the past year I have become increasingly interested in using models of simple decision making processes, as measured by two-choice response time tasks. A simple 2-choice task might ask participants to judge whether a presented number is odd or even, for example. One popular model of 2-choice RT is the Ratcliff Diffusion Model. A simplified example of this model is presented below.

diffusion model

The idea behind the model is that, once presented, the stimulus elicits an evidence-accumulation process. That is, the cognitive system begins sampling the stimulus, and evidence begins to accumulate in a noisy manner towards one of two response boundaries. Once a boundary has been reached, that response is executed. One boundary reflects the correct response, and the other reflects the incorrect response. The average rate of evidence accumulation (which is a drift diffusion process) is reflected in a parameter drift rate. The amount of evidence that needs to accumulate before a response is made is governed by the distance between the two boundaries (boundary separation parameter). The height of these boundaries are symmetrical around a starting point of the drift diffusion process, reflected in a parameter bias. The bias is typically midway between the two boundaries, as (for example) 50% of trials will present odd stimuli, so the participant will have no reason to prefer one response over the other. The final parameter is called non-decision time, which reflects the time it takes to encode the stimulus and execute the motor response. The model is very complex mathematically and computationally, but implementing the model has been made easier over recent years due to the release of several software packages. Among these are E-J Wagenmakers and colleagues’ EZ diffusion model, Vandekerckhove & Tuerlinckx’s DMAT toolbox for MatLab, and Voss and Voss’ Fast-DM. For an excellent overview of the diffusion model, as well as available packages, see this paper by E-J Wagenmakers.

However, a new package (the “RWiener” package) has just been published by Wabersich and Vandekerckhove (see paper here) for R-Statistics. R is my favourite thing EVER (seriously), so I was thrilled to see an R package that can model response times.

How many trials?

The package is very straightforward; it can generate synthetic data with known parameters as well as find best-fitting parameters from data you might have, which is superb. One question I was unsure of, though, is how accurate this parameter estimation routine is for varying number of trials in an experiment. That is, how many trials do I need as a minimum in my experiment for this package to produce accurate parameter estimation? With human data, this is a very difficult (impossible?) question to answer, but is tractable when fitting the model to artificially-generated data; that is, because we know which parameters gave rise to the data (because we tell the computer which to use), we can tell whether the estimated parameters are accurate (compare estimated parameters to “known” parameters).

I set up a simulation where I addressed the “how many trials?” question. The structure is straightforward:

  • Decide on some “known” parameters
  • Generate N trials with the known parameter for X number of simulated “experiments”
  • Fit the model to each simulated experiment
  • Compare estimated parameters to known parameters
  • Repeat for various values of N

For the simulations reported, I simulated experiments which had Ns of 25, 50, 100, 250, 500, and 1000 trials. This spans from the ridiculously small experiments (N=25) to the ridiculously large experiments (N=1000). For each N, I conducted 100 simulated experiments. Then, I produced boxplots for each parameter which allowed me to compare the estimated parameters to the known parameters (which is shown as a horizontal line on each plot). Ns which recover the known parameters well will have boxplots which cluster closely to the horizontal line for each parameter.

Rplot

Conclusion

The fitting routine is able to produce reasonably accurate parameters estimates with as little as 250 trials; beyond this, there is not much advantage to increasing trial numbers. Note that 100 trials is still OK, but more will produce better estimates. There is no systematic bias in these estimates with low trial numbers except for the non-decision parameter; estimates seemed to be too high with lower trial numbers, and this increased bias decreased as trial numbers are boosted.

The RWiener package is a welcome addition for fitting models to response time data, producing reasonably accurate parameter estimates with as little as ~250 trials.

 

R Code for above simulation

The code for the simulation is below. (Note, this takes quite a while to run, as my code is likely very un-elegant.)

rm(list = ls())

library(RWiener)

#how many trials in each sample condition?
nTrials <- c(25, 50, 100, 250, 500, 1000)
#how many simulations for each sample?
nSims <- 100

#known parameters
parms <- c(2.00, #boundary separation
           0.30, #non-decision time
           0.50, #initial bias
           0.50) #drift rate

#what are the starting parmaeters for the parameter minimisation routine?
startParms <- c(1, 0.1, 0.1, 1) #boundary, non-decision, initial bias, & drift

#start simulations here
for(i in 1:length(nTrials)){ #for each number of trials (25, 50, etc...)

  #get the current number of trials to generate
  tempTrials <- nTrials[i] 

  #generate empty matrix to store results
  tempMatrix <- matrix(0, nrow=nSims, ncol=length(parms))

  for(j in 1:nSims){ #do a loop over total number of simulations required

    #generate synthetic data
    tempData <- rwiener(n=tempTrials, alpha=parms[1], tau=parms[2], beta=parms[3],
                        delta=parms[4])
    #now fit the model to this synthetic data
    findParms <- optim(startParms, wiener_deviance, dat=tempData, method="Nelder-Mead")

    #store the best-fitting parameters
    tempMatrix[j, ] <- findParms$par
  }

  #store the matrix of results as a unique matrix
  assign(paste("data", tempTrials, sep = ""), tempMatrix)
}

#do the plotting

par(mfrow=c(2,2)) #change layout to 2x2

#plot boundary
boxplot(data25[, 1], data50[, 1], data100[, 1], data250[, 1], data500[, 1], data1000[, 1],
        names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Boundary Separation")
  abline(a=parms[1], b=0)

#plot non-decision time
boxplot(data25[, 2], data50[, 2], data100[, 2], data250[, 2], data500[, 2], data1000[, 2],
        names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Non-Decision Time")
abline(a=parms[2], b=0)

#plot initial bias
boxplot(data25[, 3], data50[, 3], data100[, 3], data250[, 3], data500[, 3], data1000[, 3],
        names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Initial Bias")
abline(a=parms[3], b=0)

#plot drift rate
boxplot(data25[, 4], data50[, 4], data100[, 4], data250[, 4], data500[, 4], data1000[, 4],
        names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Drift Rate")
abline(a=parms[4], b=0)