Simulation

Low Power & Effect Sizes

Yesterday I posted the following tweet which has since turned out to be my most popular tweet EVER with hundreds of retweets and “likes” in 24 hours:

My motivation for the tweet was quite straightforward. I have recently been emailing academics in my department every week with different topics in an attempt to raise awareness of topics associated with increasing the information value of the research we are conducting as a department. This week’s topic was “Power”. In my email—in which I included a copy of Button et al.’s (2013) excellent paper on low-power in the neurosciences—I mentioned in passing that power is not just an issue for statistical significance. I have heard from people before that low power is only an issue when interpreting null results, and that if a study produces a significant outcome, then power is not an issue.

To pre-empt this response to my encouragement to increase the power of our studies, I said in my email: “Studies with low power and significant effects have been shown to over-estimate effect sizes, meaning your low-powered study—although significant—is not giving you precision.”

As soon as I sent the email, I realised that I couldn’t recall ever reading a study that had demonstrated this. Now, I knew that such a study (or studies) would have been conducted, but I realised that I had never actually read it myself. It turns out that such studies have indeed been conducted before, as people helpfully pointed out to me on Twitter in response to my tweet:

As I was unaware of these studies—plus it was a Sunday, and I was a little bored—I thought instead of doing a literature search I would code a simulation demonstrating the inflation of effect sizes with low-powered, significant, studies the results of which I emailed to my department to demonstrate that what I had said was indeed the case. Then I thought, “Well, I haven’t tweeted much this year, so why not put it on Twitter, too.”

The incredible engagement I have had with this tweet—I propose—is due to this being a rather under-appreciated fact. Indeed, I “knew” that low-powered studies over-estimate effect sizes, but I didn’t KNOW it in the sense that I had seen hard evidence for it.

 

Details of Simulation

Because my tweet was made in passing, I didn’t explain in much detail about the stimulation implementation. I discuss this here in case others want to extend the simulation in some way.

The effect size of interest is a measure of correlation between two measures. I arbitrarily chose IQ (mean = 100, SD  = 20) and response time (mean = 600ms, SD = 80ms). I fixed the “true” effect size to be r = 0.3. It turns out that to obtain 80% power for an r=0.3 requires 85 subjects. In my simulation, I wanted to explore a wide range of sample sizes, so chose the set 10, 20, 30, 50, 85, 170, 500, and 1000.

For each sample size—N—I simulated 1,000 “studies”. For each simulated study, the following procedure occurred:

  • Sample N draws from a multivariate normal distribution with the means and SD for IQ and RT as above and a population correlation coefficient of 0.3
  • Conduct a Pearson’s correlation between the two samples
  • If the correlation was significant, store the observed correlation coefficient in a new data frame
  • If the correlation was not significant, move on without storing anything
  • After 1,000 studies are completed, plot a boxplot of the observed effect sizes for N

The result was the image in the tweet.

Limitations

Many limitations exist to this simulation, and I point interested readers to the material cited above in others’ tweets for a more formal solution. I didn’t intend for this to be a rigorous test, so it shouldn’t be taken too seriously; it was more for my own curiosity and also to provide a graphical image I could send to my colleagues at Keele to show the imprecision of effect sizes with low power. The particular outcomes are likely sensitive to my choice of means, SDs, r, etc. So, don’t generalise the specifics of this simulation, but maybe code your own tailored to your study of interest.

For me this was a bit of fun. Ten minutes of coding was time well spent on Sunday!

Advertisements

Statistics Tables: Where do the Numbers Come From?

This is a blog for undergraduates grappling with stats!

Last week I was having a chat with an undegraduate student who was due to analyse some data. She was double-checking how to determine the statistical significance of her analysis. I mentioned that she could either use SPSS (which would provide the value directly), or obtain the t-value via hand-calculation and look up the critical value in the back of her textbooks. Below is the type of table I was referring to; something all undergraduate students are familiar with. This one is for the t-test:

tTable

She then asked a very interesting question: “Where do all of these numbers come from, anyway?”. I tried my best to explain, but without any writing utensils to hand, I felt I couldn’t really do the answer justice. So, I thought I’d write a short blog-post for other curious students asking the same question: Where do these numbers come from?

A Simple Experiment

Let’s say a researcher is interested in whether alcohol affects response time (RT). She recruits 30 people into the lab, and tests their RT whilst sober. Then, she plies them with 4 pints of beer and tests their RT again. (Please note this is a poor design. Bear with me.) She finds that mean RT whilst sober was 460ms (SD = 63.63ms), and was 507ms (SD = 73.93ms) whilst drunk. The researcher performs a paired-samples t-test on the data, and finds t(29) = 2.646. Using the table above, she notes that in order for the effect to be significant at the 5% level (typically used in psychology), the t-value needs to exceed 2.043. As the observed value does exceed this “critical value”, she declares the effect significant (p<.05).

Students are familiar with this procedure, and perform it plenty of times during their study. But how many have stopped to really ask “Why does the t-value need to exceed 2.043? Why not 2.243? Or 1.043?”.

A satisfctory answer requires an appreciation of what the p-value is trying to inform you. The correct definition of the p-value is the probability of observing a test statistic as extreme—or more extreme—as the one you have observed, if the null hypothesis is true. That is, if there is truly no effect of alcohol on RT performance, what is the probability you would observe a t-value equal or higher to 2.646 (the one obtained in analysis)? The statistics table tells us that—with 30 subjects (therefore 29 degrees of freedom)—that there is only a 5% chance of observing a t-value above 2.043 (or below -2.043). But where did this number come from?

The Power of Simulations

We can work out the answer to this question mathematically (and in fact this is often covered on statistics courses), but I think it is more powerful for students to see the answer via simulations. What we can do is simulate many experiments where we KNOW that the null hypothesis is true (because we can force the computer to make this so), and perform a t-test for each experiment. If we do this many times, we get a distribution of observed t-values when the null hypothesis is true.

Animation of the Simulation

Below is a gif animation of this simulation collecting t-values. This simulation samples 30 subjects in two conditions, where the mean and standard deviation of each condition is fixed at 0 and 1, respectively. This gif only demonstrates the collation of t-values up to 300 experiments. The histogram shows the frequency of certain t-values as the number of experiments increases. The red vertical lines show the critical values for the t-value for 29 degrees of freedom.

output_1rmWho

Note that as the simulation develops, the bulk of the distribution of observed t-values falls within the critical values (i.e., they are contained within the limits defined by the red lines). In fact, in the long-run, 95% of the distribution of t-values will fall within this window. This is where the critical value comes from! It’s the value for which, in the long-run, 95% of t-values from null experiments will fall below.

A Larger Simulation

To show this, I repeated the simulation but now increased the number of experiments to 100,000. The histogram is below.

tTests120-1

As before, the bulk of the distribution is contained within the critical t-value range. If we count exactly what percentage of these simulated experiments produced a t-value of 2.043 or greater (or less than -2.043), we see this value is 4.99%—just off the 5% promised by the textbooks! Therefore, 95.01% of the simulated t-distribution falls within the red lines.

Summing Up

In this simulation, we repeated an experiment many times where the effect was known to be null. We found that 95% of the observed t-distribution fell within the range of -2.043 to 2.043. This is what the critical values are telling us. They are the t-values for which, in the long run, 95% of t-values will be less extreme than when there is no real effect. Therefore, so the argument goes, if you observe a more extreme value, this is reason to reject the null hypothesis.

The critical value changes depending on the degrees of freedom because the shape of the t-distribution under the null changes with the number of subjects in the experiment. For example, below is a histogram of null t-values in simulated experiments with 120 subjects. The textbooks tell us the critical value is 1.980. Therefore, we can predict that 95% of the distribution should fall within the window -1.980 to 1.980 (shown as the red lines below).

tTests120-1

That’s where the numbers come from!

flankr: Modelling Eriksen Flanker Task Performance

Do you do research using the Eriksen flanker task? Want to engage in some computational modelling of your data?

I’ve recently had a paper published in Behavior Research Methods reporting an R package that allows users to fit two recent computational models of performance in the Eriksen flanker task: the Dual-Stage Two-Phase (DSTP) model of Huebner et al. (2010), and the Shrinking Spotlight (SSP) model of White et al. (2011). The paper can be found here, or if you don’t have access to this journal, I have a pre-print of the article available here (but it doesn’t look as nice as the final journal version!). 

 

flankr

I have tried to make the paper as accessible as possible to researchers with no programming experience and no experience in computational modelling. To help with this, I have made a short (ish!) video discussing the main elements of the DSTP and the SSP models, as well as a brief walkthrough of how to use flankr. This will primarily be used for my students, but I thought others might find it of use, too.

If you have any questions/problems about the package then please don’t hesitate to contact me! The script I use in the video can be downloaded here.

https://www.youtube.com/watch?v=WYSh6YtP2Qs

 

Fooled by Probability

Teaching statistics to psychology undergraduate is one of the most demanding (yet rewarding) aspects of my job. This job is likely made harder by the fact that humans have a very poor “intuitive” grasp of probability. In preparation for a class this coming semester, I revisited one of my favourite examples of just how bad we are at guessing probabilities (I had first read about this example in the excellent book “Why do Buses Come in Threes?”, by Rob Eastaway & Jeremy Wyndham).

Imagine you are teaching a class with 30 students in it. What is the probability that at least two of your students share the same birthday? Before you read on, have a think what you would estimate this probability to be…

Birthday_candles

The Answer (and Proof!)

Would you be surprised to know that you have about a 78%  chance (p[matching birthday] = ~.78) that at least two people share the same birthday? I was flabbergasted when I first heard this, as I had estimated the probability to be very low (as do most people). I had expected you need about 180 people before you even had a .50 probability of finding people with the same birthday, so with only 30 students it must be even less.

The book states that the solution for any class size n can be calculated by:

p(n) = 1 * \left(1 - \frac{1}{365}\right) * \left(1 - \frac{2}{365}\right) * .... * \left(1 - \frac{n - 1}{365}\right)

Even with this proof in hand, I still couldn’t believe the result. So, I did what I always do when I don’t understand (or “get”) some mathematics I come across: I set up a simulation in R. 

Simulating the Problem

The simulation is straightforward. For each of 99 different class sizes (n = 2 to 100, in steps of 1), I populated the class with n number of random students, each with their own randomly-selected birthday. Each birthday had an equal probability of being assigned to each student (with replacement) from the 365 days in a year. Then, I examined whether any two students shared the same birthday. For each class size, I repeated this procedure 10,000 times, and plotted the results.

The plots are found below. These plot the  average percentage of matched birthdays against the size of the class being considered. I was interested in how sharply the percentage of matches rises as the class size increases. I was particularly interested in two scenarios: 1) When does the probability equal .5? 2) When does the probability equal 1?

Even with the mathematical proof above, I was still surprised at the outcome. Rplot

Both plots are the same, but the upper plot answers my first question, and the latter plot answers my second one. It seems that you only need about 23 students (!!!) before you have a 50/50 chance of at least two students sharing the same birthday. This is WAY off my intuition of there needing to be about 180 students before you could even sniff at a 50% chance of a match. 

The second plot shows that, to almost guarantee there being a match of birthdays, you need about 74 students. 

We really are fooled by probability! Poor students!

 R Code for the Simulation

#------------------------------------------------------------------------------
### simulation parameters

# define vector of calendar dates
calendar <- seq(from = 1, to = 365, by = 1)

# how many simulations do you want per class size?
nSimulations <- 10000

# what is the size of each class? Here we have class sizes
# from 2 to 100
classSize <- seq(from = 2, to = 100, by = 1)
#------------------------------------------------------------------------------


#------------------------------------------------------------------------------
### start simulation here

# initiate vector to store data for each class size
classData <- numeric(length(classSize))

# initiate looping variable for class sizes
j <- 1

# loop over each class size
for(currSize in classSize){
  
  # initiate vector to store counts of replicated birthdays
  data <- numeric(nSimulations)
  
  # start the loop over number of simulated classes
  for(i in 1:nSimulations){
    
    # sample the students' birthdayes
    dates <- sample(calendar, currSize, replace = TRUE)
    
    # count how many replications, and add to the data vector
    count <- sum(duplicated(dates))
    
      if(count > 0){
        data[i] <- 1
      } 
      
  } # end of current class size
  
  # calculate the percentage of replicated birthdays across all simulated 
  # runs
  classData[j] <- (sum(data) / nSimulations) * 100
  
  # update looping variable
  j <- j + 1
  
}
#------------------------------------------------------------------------------


#------------------------------------------------------------------------------
# now do some plotting
par(mfrow = c(2, 1))

plot(classSize, classData, ylab = "Percentage of Matched Birthdays", 
     xlab = "Class Size", pch = 2, type = "b")
segments(-2, 50, 22.8, 50, lty = 2, col = "red", lwd = 2)
segments(22.8, 50, 22.8, -2, lty = 2, col = "red", lwd = 2)
text(22, 5, pos = 4, "With ~23 students, p(matching birthday) = .5!", col = "red")

plot(classSize, classData, ylab = "Percentage of Matched Birthdays", 
     xlab = "Class Size", pch = 2, type = "b")
segments(-2, 100, 74, 100, lty = 2, col = "blue", lwd = 2)
segments(74, 100, 74, -2, lty = 2, col = "blue", lwd = 2)
text(74, 90, pos = 4, "With 74 students, 
     p(matching birthday) = 1!", col = "blue")
#------------------------------------------------------------------------------

(Don’t) Lose Your Inhibitions

[Cognitive inhibition can be defined as]…the stopping or overriding of a mental process, in whole or in part, with or without intention —MacLeod (2007)

The concept of cognitive inhibition is rather controversial in cognitive science. When performing a task or an action in the face of competing tasks/actions, is it sufficient to activate a representation of the relevant task/action? Or, does the activation of the competing task/action’s representation need to be inhibited (suppressed)? 

For example, if when presented with the following stimulus I ask you to name the red object, you will be hindered by the presence of an overlapping distractor (the green item). You are trying to activate an action of responding “Chair!”, but the presence of a distracting response “Duck!” hinders efficient selection compared to if the chair was presented alone.

chair_duck

So how do you perform this task successfully? Is it enough to activate the target representation so that it is above the activation level of the distractor’s representation? Or, do you need to activate the target representation whilst at the same time inhibiting the distracting representation? There are people on both sides of this argument. Gordon Logan once stated there are two sets of researchers in cognitive psychology: inhibitophobes, and inhibitophiles. Inhibitophiles are pro-inhibition, and use inhibitory constructs in every theory/explanation of cognitive operation. On the contrary, inhibiophobes eschew inhibition in any explanation of empirical phenomena. 

I argue there is a third set in this camp, one which I firmly place myself: inhibitosceptics. Whilst I accept a potential role for inhibition, I always like to see if an empirical phenomena ascribed to inhibition can be explained with non-inhibitory processes. This is just an application of Occam’s razor: why include activation + inhibition if activation alone is sufficient. 

However, one thing can’t be denied: inhibition is an incredibly powerful—and efficient—selection mechanism. 

The Power of Inhibition

If one assumes—quite safely, given our knowledge of neural dynamics—that there is an upper-limit to activation of a particular representation, then in the face of competition it will not be sufficient to merely “boost” the activation of task-relevant representations; there is an upper limit to how much something can be activated. This means the activation from the competing representation will interfere with selection of the target representation. 

By anology, imagine a stereo playing your favourite CD in your room at home. Let’s say this stereo is on volume 7 (out of 10). Now, imagine you wish to switch to a new CD, and you do this by turning on a separate stereo (the analogy doesn’t work if you merely switch the old CD, so bear with me!). In order to hear this new CD over the noise of the old CD, you need to turn the volume of the new stereo above 7. However, there is a limit to the volume of the stereo (10), so you can only turn the volume up so much. Even though the new stereo is playing at 10, you can still hear the old stereo playing at 7, which interferes with your enjoyment of the new CD. You need to turn the volume of the old stereo down as well as turning the volume of the new stereo up. (Or, as Spinal Tap did, you could just go to 11.)

This “turning the volume down” is inhibition. Coming back to selection of tasks/actions, you cannot merely activate the target representation above that of the distracting representation, because there is a ceiling to activation. Inhibitophiles will argue that you need to activate the target representation, and inhibit the competing representation. 

To demonstrate how elegant this selection mechanism is, I ran some simulations in R of activation traces of two competing representations (a target representation, and a distractor representation) during a task similar to that above: name the red object. (The details of the simulation are in a below section.) The two representations initially receive no input, and so their activations are near zero. However, at time 1,000 the stimulus onsets, and stays on until the end. At this time, each representation receives some degree of activation (because there is a red object on the screen and a green object). The red representation receives a little more activation, to represent the bias to name the red object (this could be attributed to an attentional weight, for example). Below is a plot of the activation trace with no inhibition in the system (i.e., a pure activation model).

noInhibition

 The time of stimulus onset is signified by the vertical red dashed line. As you can see, before this time both actions are equally active, but when stimulus onsets activation begins to grow. As the excitatory input to both units is very similar (0.65 for the red target, 0.60 for the green distractor), the growth of both activations are similar. However, the red representation begins to be more active a short while later. However, if we assume selection efficiency is inversely proportional to the similarity of the activation of the two traces, we can conclude that selection of the target action will not be very efficient. This is because the activation of the distracting representation is very high, too.

So, an activation-only model (at least, this variant) leads to strong interference from distracting actions. So, let’s add some inhibition to the system. In the next plot, the activation input to each representation is the same as before, but now an inhibitory link exists between the two representations. That is, each representation spreads inhibition to the competing representations; the degree of the inhibition sent to competing representations is proportional to how active the sender is, and how strong the inhibitory control of the system is. That is, the more active a unit is, the more inhibition it will send to the competing units; also, the stronger the inhibitory control of the system, the more inhibition that will be spread. This “winner takes all” process leads to rapid and efficient selection of the target item. The plot below shows activation traces with strong inhibition in the system.

strongInhibition

 

As you can see, the target and distracting representations each begin to become active, but quickly the target representation sends strong inhibition to the distracting representation. The large vertical difference between these two traces suggests selection of the target action will be fast and accurate.

It turns out the system doesn’t need much inhibition at all to achieve this selection. Below is a plot where I incremented the inhibitory strength of the model in 0.15 steps (the value is shown in the header of each plot).

variousInhibitionInhibition thus appears an efficient selection mechanism which overcomes the ceiling effect inherent in activation-only models. So—contrary to the popular “care-free” command—it appears essential that if you want efficient error-free selection of relevant actions, don’t lose your inhibitions!

Simulation Details

The simulation uses a variant of Usher & McLelland’s leaky, competing, accumulator model. The change in activation of the target at each time-step is given by

target

and the change in activation for the distractor is given by

distractorIn these equations, I_{i} is the excitatory input into each unit, x_{i} is the current activation of unit i, \lambda is a leakage parameter (that is, the current activation of a unit is subject to decay), \beta is the inhibitory-strength parameter, \frac{dt}{\tau} is a time-step parmaeter (set to 0.05 in all simulations), and \xi is a gaussian noise component (mean of zero, sd of 0.05).

The input to each unit was zero until a time of 1,000, at which point I_{Target} was set to 0.65 and I_{Distractor} was set to 0.60. The leakage parameter \lambda was set to 0.6 in all simulations, and the inhibitory strength parameter \beta was varied as in the text. This is what was varied in each panel of the 6-figure plot above, with the value for \beta being shown in the header of each sub-plot.

Response Time Modelling – How Many Trials?

Over the past year I have become increasingly interested in using models of simple decision making processes, as measured by two-choice response time tasks. A simple 2-choice task might ask participants to judge whether a presented number is odd or even, for example. One popular model of 2-choice RT is the Ratcliff Diffusion Model. A simplified example of this model is presented below.

diffusion model

The idea behind the model is that, once presented, the stimulus elicits an evidence-accumulation process. That is, the cognitive system begins sampling the stimulus, and evidence begins to accumulate in a noisy manner towards one of two response boundaries. Once a boundary has been reached, that response is executed. One boundary reflects the correct response, and the other reflects the incorrect response. The average rate of evidence accumulation (which is a drift diffusion process) is reflected in a parameter drift rate. The amount of evidence that needs to accumulate before a response is made is governed by the distance between the two boundaries (boundary separation parameter). The height of these boundaries are symmetrical around a starting point of the drift diffusion process, reflected in a parameter bias. The bias is typically midway between the two boundaries, as (for example) 50% of trials will present odd stimuli, so the participant will have no reason to prefer one response over the other. The final parameter is called non-decision time, which reflects the time it takes to encode the stimulus and execute the motor response. The model is very complex mathematically and computationally, but implementing the model has been made easier over recent years due to the release of several software packages. Among these are E-J Wagenmakers and colleagues’ EZ diffusion model, Vandekerckhove & Tuerlinckx’s DMAT toolbox for MatLab, and Voss and Voss’ Fast-DM. For an excellent overview of the diffusion model, as well as available packages, see this paper by E-J Wagenmakers.

However, a new package (the “RWiener” package) has just been published by Wabersich and Vandekerckhove (see paper here) for R-Statistics. R is my favourite thing EVER (seriously), so I was thrilled to see an R package that can model response times.

How many trials?

The package is very straightforward; it can generate synthetic data with known parameters as well as find best-fitting parameters from data you might have, which is superb. One question I was unsure of, though, is how accurate this parameter estimation routine is for varying number of trials in an experiment. That is, how many trials do I need as a minimum in my experiment for this package to produce accurate parameter estimation? With human data, this is a very difficult (impossible?) question to answer, but is tractable when fitting the model to artificially-generated data; that is, because we know which parameters gave rise to the data (because we tell the computer which to use), we can tell whether the estimated parameters are accurate (compare estimated parameters to “known” parameters).

I set up a simulation where I addressed the “how many trials?” question. The structure is straightforward:

  • Decide on some “known” parameters
  • Generate N trials with the known parameter for X number of simulated “experiments”
  • Fit the model to each simulated experiment
  • Compare estimated parameters to known parameters
  • Repeat for various values of N

For the simulations reported, I simulated experiments which had Ns of 25, 50, 100, 250, 500, and 1000 trials. This spans from the ridiculously small experiments (N=25) to the ridiculously large experiments (N=1000). For each N, I conducted 100 simulated experiments. Then, I produced boxplots for each parameter which allowed me to compare the estimated parameters to the known parameters (which is shown as a horizontal line on each plot). Ns which recover the known parameters well will have boxplots which cluster closely to the horizontal line for each parameter.

Rplot

Conclusion

The fitting routine is able to produce reasonably accurate parameters estimates with as little as 250 trials; beyond this, there is not much advantage to increasing trial numbers. Note that 100 trials is still OK, but more will produce better estimates. There is no systematic bias in these estimates with low trial numbers except for the non-decision parameter; estimates seemed to be too high with lower trial numbers, and this increased bias decreased as trial numbers are boosted.

The RWiener package is a welcome addition for fitting models to response time data, producing reasonably accurate parameter estimates with as little as ~250 trials.

 

R Code for above simulation

The code for the simulation is below. (Note, this takes quite a while to run, as my code is likely very un-elegant.)

rm(list = ls())

library(RWiener)

#how many trials in each sample condition?
nTrials <- c(25, 50, 100, 250, 500, 1000)
#how many simulations for each sample?
nSims <- 100

#known parameters
parms <- c(2.00, #boundary separation
           0.30, #non-decision time
           0.50, #initial bias
           0.50) #drift rate

#what are the starting parmaeters for the parameter minimisation routine?
startParms <- c(1, 0.1, 0.1, 1) #boundary, non-decision, initial bias, & drift

#start simulations here
for(i in 1:length(nTrials)){ #for each number of trials (25, 50, etc...)

  #get the current number of trials to generate
  tempTrials <- nTrials[i] 

  #generate empty matrix to store results
  tempMatrix <- matrix(0, nrow=nSims, ncol=length(parms))

  for(j in 1:nSims){ #do a loop over total number of simulations required

    #generate synthetic data
    tempData <- rwiener(n=tempTrials, alpha=parms[1], tau=parms[2], beta=parms[3],
                        delta=parms[4])
    #now fit the model to this synthetic data
    findParms <- optim(startParms, wiener_deviance, dat=tempData, method="Nelder-Mead")

    #store the best-fitting parameters
    tempMatrix[j, ] <- findParms$par
  }

  #store the matrix of results as a unique matrix
  assign(paste("data", tempTrials, sep = ""), tempMatrix)
}

#do the plotting

par(mfrow=c(2,2)) #change layout to 2x2

#plot boundary
boxplot(data25[, 1], data50[, 1], data100[, 1], data250[, 1], data500[, 1], data1000[, 1],
        names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Boundary Separation")
  abline(a=parms[1], b=0)

#plot non-decision time
boxplot(data25[, 2], data50[, 2], data100[, 2], data250[, 2], data500[, 2], data1000[, 2],
        names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Non-Decision Time")
abline(a=parms[2], b=0)

#plot initial bias
boxplot(data25[, 3], data50[, 3], data100[, 3], data250[, 3], data500[, 3], data1000[, 3],
        names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Initial Bias")
abline(a=parms[3], b=0)

#plot drift rate
boxplot(data25[, 4], data50[, 4], data100[, 4], data250[, 4], data500[, 4], data1000[, 4],
        names = nTrials, xlab = "Number of Trials", ylab = "Parameter estimate", main = "Drift Rate")
abline(a=parms[4], b=0)