trimr: An R Package of Response Time Trimming Methods

Response times are a dominant dependent variable in cognitive psychology (and other areas). It is common that raw response times need to undergo some trimming before being submitted to inferential analysis; this is because RTs typically suffer from outliers: a small proportions of RTs that lie at the extremes of the response time distribution and are thought to arise from processes not under investigation.

There are a wide array of response time trimming methods that I outlined in a previous post. Some are very simple to implement (such as removing all RTs slower than 2 seconds, for example). Some are intermediate in terms of difficulty of implementation (such as removing all RTs slower than 2.5 standard deviations above the mean of each participant of each condition). Some are downright tricky to implement, such as the modified recursive procedure of Van Selst & Jolicoeur (1994).

To make response time trimming simpler for the researcher, I have developed a small R package—trimr—that takes raw RTs for all participants and experimental conditions, performs any trimming method the user requires, and returns data for all participants and conditions ready for inferential testing.

Below I provide an overview of how to use trimr.

Overview

trimr is an R package that implements most commonly-used response time trimming methods, allowing the user to go from a raw data file to a finalised data file ready for inferential statistical analysis.

trimr is available from CRAN. To download it, open R and type:

install.packages("trimr")

To install the latest version of trimr (i.e., the development version of next release), install devtools, and install directly from GitHub by using:

# install devtools
install.packages("devtools")

# install trimr from GitHub
devools::install_github("JimGrange/trimr")

(To report any bugs in trimr—which are likely—please see my GitHub account for trimr, and click on “issues” in the top right.)

The trimming functions fall broadly into three families (together with the function names for each method implemented in trimr):

  1. Absolute Value Criterion:
    • absoluteRT
  2. Standard Deviation Criterion:
    • sdTrim
  3. Recursive / Moving Criterion:
    • nonRecursive
    • modifiedRecursive
    • hybridRecursive

Example Data

trimr ships with some example data—“exampleData”—that the user can explore the trimming functions with. This data is simulated (i.e., not real), and has data from 32 subjects. This data is from a task switching experiment, where RT and accuracy was recorded for two experimental conditions: Switch, when the task switched from the previous trial, and Repeat, when the task repeated from the previous trial.

# load the trimr package
library(trimr)

# activate the data
data(exampleData)

# look at the top of the data
head(exampleData)
##   participant condition   rt accuracy
## 1           1    Switch 1660        1
## 2           1    Switch  913        1
## 3           1    Repeat 2312        1
## 4           1    Repeat  754        1
## 5           1    Switch 3394        1
## 6           1    Repeat  930        1

The exampleData consists of 4 columns:

  • participant: Codes the number of each participant in the experiment
  • condition: In this example, there are two experimental conditions: “Switch”, and “Repeat”.
  • rt: Logs the response time of the participant in milliseconds.
  • accuracy: Logs the accuracy of the response. 1 codes a correct response, 0 an error response.

At a minimum, users using their own data need columns with these names in their data frame they are using trimr for. The user can use RTs logged in milliseconds (as here) or in seconds (e.g., 0.657). The user can control the number of decimal places to round the trimmed data to.


Absolute Value Criterion

The absolute value criterion is the simplest of all of the trimming methods available (except of course for having no trimming). An upper- and lower-criterion is set, and any response time that falls outside of these limits are removed. The function that performs this trimming method in trimr is called absoluteRT.

absoluteRT

In this function, the user decalares lower- and upper-criterion for RT trimming (minRT and maxRT arguments, respectively); RTs outside of these criteria are removed. Note that these criteria must be in the same unit as the RTs are logged in within the data frame being used. The function also has some other important arguments:

  • omitErrors: If the user wishes error trials to be removed from the trimming, this needs to be set to TRUE (it is set to this by default). Alternatively, some users may wish to keep error trials included. Therefore, set this argument to FALSE.
  • returnType: Here, the user can control how the data are returned. “raw” returns trial-level data after the trials with trimmed RTs are removed; “mean” returns calculated mean RT per participant per condition after trimming; “median” returns calculated median RT per participant per condition after trimming. This is set to “mean” by default.
  • digits: How many digits to round the data to after trimming? If the user has a data frame where the RTs are recorded in seconds (e.g., 0.657), this argument can be left at its default value of 3. However, if the data are logged in milliseconds, it might be best to change this argument to zero, so there are no decimal places in the rounding of RTs (e.g., 657).

In this first example, let’s trim the data using criteria of RTs less than 150 milliseconds and greater than 2,000 milliseconds, with error trials removed before trimming commences. Let’s also return the mean RTs for each condition, and round the data to zero decimal places.

# perform the trimming
trimmedData <- absoluteRT(data = exampleData, minRT = 150, 
                          maxRT = 2000, digits = 0)

# look at the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1    901    742
## 2           2   1064    999
## 3           3   1007    802
## 4           4   1000    818
## 5           5   1131    916
## 6           6   1259   1067

Note that trimr returns a data frame with each row representing each participant in the data file (logged in the participant column), and separate columns for each experimental condition in the data.

If the user wishes to recive back trial-level data, change the “returnType” argument to “raw”:

# perform the trimming
trimmedData <- absoluteRT(data = exampleData, minRT = 150, 
                          maxRT = 2000, returnType = "raw", 
                          digits = 0)

# look at the top of the data
head(trimmedData)
##    participant condition   rt accuracy
## 1            1    Switch 1660        1
## 2            1    Switch  913        1
## 4            1    Repeat  754        1
## 6            1    Repeat  930        1
## 7            1    Switch 1092        1
## 11           1    Repeat  708        1

Now, the data frame returned is in the same shape as the initial data file, but rows containing trimmed RTs are removed.


Standard Deviation Criterion

This trimming method uses a standard deviation multiplier as the upper criterion for RT removal (users still need to enter a lower-bound manually). For example, this method can be used to trim all RTs 2.5 standard deviations above the mean RT. This trimming can be done per condition (e.g., 2.5 SDs above the mean of each condition), per participant (e.g., 2.5 SDs above the mean of each participant), or per condition per participant (e.g., 2.5 SDs above the mean of each participant for each condition).

sdTrim

In this function, the user delcares a lower-bound on RT trimming (e.g., 150 milliseconds) and an upper-bound in standard deviations. The value of standard deviation used is set by the SD argument. How this is used varies depending on the values the user passes to two important function arguments:

  • perCondition: If set to TRUE, the trimming will occur above the mean of each experimental condition in the data file.
  • perParticipant: If set to TRUE, the trimming will occur above the mean of each participant in the data file.

Note that if both are set to TRUE, the trimming will occur per participant per condition (e.g., if SD is set to 2.5, the function will trim RTs 2.5 SDs above the mean RT of each participant for each condition).

In this example, let’s trim RTs faster than 150 milliseconds, and greater than 3 SDs above the mean of each participant, and return the mean RTs:

# trim the data
trimmedData <- sdTrim(data = exampleData, minRT = 150, sd = 3, 
                      perCondition = FALSE, perParticipant = TRUE, 
                      returnType = "mean", digits = 0)

# look at the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1   1042    775
## 2           2   1136   1052
## 3           3   1020    802
## 4           4   1094    834
## 5           5   1169    919
## 6           6   1435   1156

Now, let’s trim per condition per participant:

# trim the data
trimmedData <- sdTrim(data = exampleData, minRT = 150, sd = 3, 
                      perCondition = TRUE, perParticipant = TRUE, 
                      returnType = "mean", digits = 0)

# look at the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1   1099    742
## 2           2   1136   1038
## 3           3   1028    802
## 4           4   1103    834
## 5           5   1184    916
## 6           6   1461   1136

Recursive / Moving Criterion

Three functions in this family implement the trimming methods proposed & discussed by van Selst & Jolicoeur (1994): nonRecursive, modifiedRecursive, and hybridRecursive. van Selst & Jolicoeur noted that the outcome of many trimming methods is influenced by the sample size (i.e., the number of trials) being considered, thus potentially producing bias. For example, even if RTs are drawn from identical positively-skewed distributions, a “per condition per participant” SD procedure (see sdTrim above) would result in a higher mean estimate for small sample sizes than larger sample sizes. This bias was shown to be removed when a “moving criterion” (MC) was used; this is where the SD used for trimming is adapted to the sample size being considered.

nonRecursive

The non-recursive method proposed by van Selst & Jolicoeur (1994) is very similar to the standard deviation method outlined above with the exception that the user does not specify the SD to use as the upper bound. The SD used for the upper bound is rather decided by the sample size of the RTs being passed to the trimming function, with larger SDs being used for larger sample sizes. Also, the function only trims per participant per condition.

The nonRecursive function checks the sample size of the data being passed to it, and looks up the SD criterion required for the data’s sample size. The function looks in a data file contained in trimr called linearInterpolation. Should the user wish to see this data file (although the user will never need to access it if they are not interested), type:

# load the data
data(linearInterpolation)

# show the first 20 rows (there are 100 in total)
linearInterpolation[1:20, ]
##    sampleSize nonRecursive modifiedRecursive
## 1           1        1.458             8.000
## 2           2        1.680             6.200
## 3           3        1.841             5.300
## 4           4        1.961             4.800
## 5           5        2.050             4.475
## 6           6        2.120             4.250
## 7           7        2.173             4.110
## 8           8        2.220             4.000
## 9           9        2.246             3.920
## 10         10        2.274             3.850
## 11         11        2.310             3.800
## 12         12        2.326             3.750
## 13         13        2.334             3.736
## 14         14        2.342             3.723
## 15         15        2.350             3.709
## 16         16        2.359             3.700
## 17         17        2.367             3.681
## 18         18        2.375             3.668
## 19         19        2.383             3.654
## 20         20        2.391             3.640

Notice there are two columns. This current function will only look in the nonRecursive column; the other column is used by the modifiedRecursive function, discussed below. If the sample size of the current set of data is 16 RTs (for example), the function will use an upper SD criterion of 2.359, and will proceed much like the sdTrim function’s operations.

Note the user can only be returned the mean trimmed RTs (i.e., there is no “returnType” argument for this function).

# trim the data
trimmedData <- nonRecursive(data = exampleData, minRT = 150, 
                            digits = 0)

# see the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1   1053    732
## 2           2   1131   1026
## 3           3   1017    799
## 4           4   1089    818
## 5           5   1169    908
## 6           6   1435   1123

modifiedRecursive

The modifiedRecursive function is more involved than the nonRecursive function. This function performs trimming in cycles. It first temporarily removes the slowest RT from the distribution; then, the mean of the sample is calculated, and the cut-off value is calculated using a certain number of SDs around the mean, with the value for SD being determined by the current sample size. In this procedure, required SD decreases with increased sample size (cf., the nonRecursive method, with increasing SDs with increasing sample size; see the linearInterpolation data file above); see Van Selst and Jolicoeur (1994) for justification.

The temporarily removed RT is then returned to the sample, and the fastest and slowest RTs are then compared to the cut-off, and removed if they fall outside. This process is then repeated until no outliers remain, or until the sample size drops below four. The SD used for the cut-off is thus dynamically altered based on the sample size of each cycle of the procedure, rather than static like the nonRecursive method.

# trim the data
trimmedData <- modifiedRecursive(data = exampleData, minRT = 150, 
                                 digits = 0)

# see the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1    792    691
## 2           2   1036    927
## 3           3    958    716
## 4           4   1000    712
## 5           5   1107    827
## 6           6   1309   1049

hybridRecursive

van Selst and Jolicoeur (1994) reported slight opposing trends of the non-recursive and modified-recursive trimming methods (see page 648, footnote 2). They therefore, in passing, suggested a “hybrid-recursive” method might balance the opposing trends. The hybrid-recursive method simply takes the average of the non-recursive and the modified-recursive methods.

# trim the data
trimmedData <- hybridRecursive(data = exampleData, minRT = 150, 
                               digits = 0)

# see the top of the data
head(trimmedData)
##   participant Switch Repeat
## 1           1    923    711
## 2           2   1083    976
## 3           3    987    757
## 4           4   1044    765
## 5           5   1138    867
## 6           6   1372   1086

Data from Factorial Designs

In the example data that ships with trimr, the RT data comes from just two conditions (Switch vs. Repeat), which are coded in the column “condition”. However, in experimental psychology, factorial designs are prevalent, where RT data comes from more than one independent variable, with each IV having multiple levels. How can trimr deal with this format?

First, let’s re-shape the exampleData set to how data might be stored from a factorial design. Let there be two IVs, each with two levels:

  1. taskSequence: Switch vs. Repeat
  2. reward: Reward vs. NoReward

The taskSequence factor is coding whether the task has Switched or Repeated from the task on the previous trial (as before). The reward factor is coding whether the participant was presented with a reward or not on the current trial (presented randomly). Let’s reshape our data frame to match this fictitious experimental scenario:

# get the example data that ships with trimr
data(exampleData)

# pass it to a new variable
newData <- exampleData

# add a column called "taskSequence" 
newData$taskSequence <- newData$condition

# add a column called "reward" 
# Fill it with random entries, just for example. 
# This uses R's "sample" function
newData$reward <- sample(c("Reward", "NoReward"), nrow(newData), 
                         replace = TRUE)

# delete the "condition" column
newData <- subset(newData, select = -condition)

# now let's look at our new data
head(newData)
##   participant   rt accuracy taskSequence   reward
## 1           1 1660        1       Switch   Reward
## 2           1  913        1       Switch NoReward
## 3           1 2312        1       Repeat NoReward
## 4           1  754        1       Repeat   Reward
## 5           1 3394        1       Switch   Reward
## 6           1  930        1       Repeat   Reward

This now looks how data typically comes in from a factorial design. Now, to get trimr to work on this, we need to create a new column called “condition”, and to place in this column the levels of all factors in the design. For example, if the first trial in our newData has taskSequence = Switch and reward = NoReward, we would like our condition entry for this trial to read “Switch_NoReward”. This is simple to do using R’s “paste” function. (Note that this code can be adapted to deal with any number of factors.)

# add a new column called "condition".
# Fill it with information our factors
newData$condition <- paste(newData$taskSequence, "_", newData$reward, sep = "")

# look at the data
head(newData)
##   participant   rt accuracy taskSequence   reward       condition
## 1           1 1660        1       Switch   Reward   Switch_Reward
## 2           1  913        1       Switch NoReward Switch_NoReward
## 3           1 2312        1       Repeat NoReward Repeat_NoReward
## 4           1  754        1       Repeat   Reward   Repeat_Reward
## 5           1 3394        1       Switch   Reward   Switch_Reward
## 6           1  930        1       Repeat   Reward   Repeat_Reward

Now we can pass this data frame to trimr, and it will work perfectly.

# trim the data
trimmedData <- sdTrim(newData, minRT = 150, sd = 2.5)

# check it worked
# (remove participant column so it fits in blog
# window)
head(trimmedData[, -1])
##   Switch_Reward Switch_NoReward Repeat_NoReward Repeat_Reward
## 1      1053.348        1054.317         702.924       769.549
## 2      1147.131        1106.321        1009.798      1030.902
## 3       998.140        1030.262         791.779       799.584
## 4      1105.244        1073.707         835.189       807.512
## 5      1191.217        1148.509         902.888       903.907
## 6      1483.667        1398.000        1087.029      1160.127

References

Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R package version 1.0.1.

Van Selst, M. & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. Quarterly Journal of Experimental Psychology, 47 (A), 631–650.

Advertisements

7 comments

  1. Cool! Have you considered adding the mixture model approach described by Ratcliff 1994? In that paper he ends up giving the method a not-so-strong endorsement, but suggests that there may be some situations where it would be useful. Anyway, could be cool to have another option in the package for users to experiment with and see if it yields sensible results.

    1. Hi – thanks for the comment. I plan to add some other approaches to the package, but I hadn’t considered this one by Ratcliff. I will have to check it out, so thanks!

  2. Hi, Thanks for sharing this interesting package. I’ve tried to used this on my own data set, and found it was inconvenient, because I have 3 three variables and each have at least two conditions. So I can’t use trimr directly. Also, I guess it’s common to have more than two variables in psychological research. So maybe further version may allow us to setting the conditions.
    Nevertheless, it’s very nice for you to share this tool.

    1. Hi – you can use trimr even for factorial designs, so long as the column coding condition is set appropriately. Give me a little while and I will update my example vignette to include an example when you have more than one condition.

    2. Hi again – I have now updated the blog to show an example how to get trimr to work with data from factorial designs. If you have any questions about this, or need me to modify the script so it works on your data then please do let me know!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s