Uncategorized

It’s been a year since the Open Science Collaboration’s publication on “Estimating the Reproducibility of Psychological Science” was published in Science. It has been cited 515 times since publication, and has been met with much discussion on social networks.

I am interested in what changes your psychology department have made since. Are staff actively encouraged to pre-register all studies? Do you ask faculty members for open data? Are faculty members asked to provide open materials? Do ethics panels check the power of planned studies? Have you embedded Open Science practices into your research methods teaching?

I am preparing a report for my department on how we can address the issues surrounding the replication crisis, and I would be very interested to hear what other departments have done to address these important issues. Please comment on this post with what your department has done!

I Heart Statistics

Students often look at me oddly when I try to express just how much I love statistics. Colleagues look at me oddly when I try to express how much I love computer simulation. Yes, they both can be dull (I suppose; I don’t see it, but whatever…), but they can also be pretty damn cool. I’m always on the lookout for examples of how statistics have been applied in neat and interesting ways, mostly for my own geeky interest, but also to potentially use to promote the utility of statistics to students.

This morning I came across a Tweet advertising a talk tonight in London by Dr Ruth King at London’s Mathematical Society. The title of the talk is “How to count invisible people”.  I was instantly interested! The question being addressed is how do you count people who don’t wish to be counted, such as the number of illegal immigrants  in the United Kingdom? A related problem is how to estimate the size of a given population if you can’t access the whole population.

It turns out you can arrive at a pretty decent estimate of the size of a population using something called the Lincoln-Peterson estimator, which I had not heard of before. It’s pretty neat! It demonstrates nicely the utility of statistics: they allow you to infer to a population what you have only measured in a sample from that population.

Say you wish to estimate the population of a large city. What you can do is go out on one particular day, and approach a set number of people (say 5,000), and take their names on a list (let’s call this list A). Then, come back on another day soon after, and approach another set number of people (say another 5,000), and put their names on a list (list B). Then, count the number of people that were present on both lists. Using the Lincoln-Peterson estimator, you can arrive at a pretty-decent estimate of the total population using the following equation:

$Estimated Population = \left(\frac{N_{list A} * N_{list B}}{N_{BothLists}}\right)$

where N refers to “number”.

Simulating the Lincoln-Peterson Estimation Accuracy

How cool is that?? But does it work? If it does work, how accurate is it? I was skeptical that if you had 5,000 people on list A and 5,000 people on list B that you could accurately estimate the population size if the TRUE (but unknown) size was significantly larger, say 5 million. So, I wanted to test this.

I utilised computer simulation to arrive at an estimate of how accurate the formula is. Computer simulation is ideal for this as you can tell the computer what the true population size of a simulated city is. Thus, we can compare the estimate from the Lincoln-Peterson equation with the true and known population size. This can’t be done with “real-life”.

In my first simulation, I set the true population size to 50,000. I “approached” 5,000 people for list A, and 5,000 people for list B. The ratio of people on each list and the true population size is very small in this example, but I wanted to test how well the formula worked under very favourable circumstances. To get an estimate of the error in the Lincoln-Peterson formula, I repeated this census-taking 10,000 times, so I could plot the outcome of each census as a boxplot, which provides estimates of variance in the equation.

The Figure below shows the result of this simulation. As can be seen, the median of the simulated censi is spot-on with the “true” population size (shown as the horizontal red line). The whiskers have a range of median +/- 10%, which is pretty good!

But this simulation was favourable for the equation, as the number sampled (5,000) was pretty close to the true population size (50,000). In the next simulation, I repeated the above simulation across a wide range of true population sizes, from 25,000 up to 5 million. It’s important to note that in each simulation I still only sampled 5,000 people on each day. The result is below.

As can be seen, the simulation median always captures the true population size, although the variance in this estimate increases as the true population size increases. This is to be expected. But, the formula still works amazingly well. You can pretty accurately estimate a true population of 5 million people just by sampling 5,000 people! Amazing!

I was really impressed by this, and it’s a great example of the utilisation of statistics to get at real-world problems. So, if you’re in London tonight, go to that talk. I will be sad to miss it!

Here is the R code for the above simulations:

# Lincoln–Petersen estimator
#------------------------------------------------------------------------------
### First simulation: just one known population

# what is the true number in the popoulation?
nTrue <- 50000

# how many simulated censi to take?
nSims <- 10000

# vector to store estimate of each simulated census
data <- numeric(nSims)

## conduct the simulation!
for(i in 1:nSims){

# take the first census
nFirst <-5000
a <- sample(1:nTrue, size = nFirst, replace = FALSE)
a <- sort(a)

# take the second census
nSecond <- 5000
b <- sample(1:nTrue, size = nSecond, replace = FALSE)
b <- sort(b)

# how many people are on both lists?
nBoth <- which(a %in% b)
nBoth <- length(nBoth)

# calculate the estimated total population using the
# Lincoln–Petersen estimator
total <- (nFirst * nSecond) / nBoth

# remove zeroes from nBoth (when noone was present on both days)
if(nBoth == 0){
total = 0
}

# store the data of the current simulation
data[i] <- total
}

# plot the data
boxplot(data, ylab = "Estimated Population Size", main = nTrue)
abline(h = nTrue, lwd = 2, lty = 2, col = "red")
#------------------------------------------------------------------------------

#------------------------------------------------------------------------------
# change the plot to a 3x2 grid
par(mfrow = c(3, 2))

# disable scientific notation
options(scipen=999)

# what true populations to explore?
population <- c(25000, 100000, 250000, 500000, 1000000, 5000000)

# how many simulated censi to take?
nSims <- 10000

# iterate over each true population, and run the simulations
for(currPop in population){

# vector to store outcome of each simulated census
data <- numeric(nSims)

for(i in 1:nSims){

# take the first census
nFirst <- 5000
a <- sample(1:currPop, size = nFirst, replace = FALSE)
a <- sort(a)

# take the second census
nSecond <- 5000
b <- sample(1:currPop, size = nSecond, replace = FALSE)
b <- sort(b)

# how many are on both lists?
nBoth <- which(a %in% b)
nBoth <- length(nBoth)

# calculate the estimated total population using the
# Lincoln–Petersen estimator
total <- (nFirst * nSecond) / nBoth

# remove zeroes from nBoth
if(nBoth == 0){
total = 0
}

# store the data of the current simulation
data[i] <- total
}

# plot the data
boxplot(data, ylab = "Estimated Population Size", main = currPop)
abline(h = currPop, lwd = 2, lty = 2, col = "red")
}
#------------------------------------------------------------------------------


Dealing with Rejection

If there is one thing academics need to get used to very quickly it’s rejection. I’ve had a multitude of papers rejected, I’ve had several grants (large and small) rejected, and I even once had a conference submission rejected (yep – it CAN happen). There’s no other way to say it: rejection hurts.  It’s difficult—sometimes it feels impossible—but I try to see the positive side of rejections where I can.

I clearly remember finding my first rejection less difficult than I had anticipated. It was within the first three months of my PhD, and I had the opportunity to submit the first experiment of my project as a short commentary paper. When the reviews came back, I was disappointed to see it had been rejected; in fact, this is likely an understatement, as I had high hopes it would be lauded and immediately put on the front page of the journal (OK, this is an exaggeration, but I thought it was a nice little paper). Despite my disappointment, I remember the next day reading the reviews again through fresh eyes, and smiling to myself. It was an unexpected smile; one of those that take you by surprise when you suddenly realise you’re doing it, and you’re not sure why. I had the sudden realisation that my work—yes, MINE!—had been looked at by three experts in my field. These researchers—whom I respected greatly—had taken the time to look at my work and provide me with valuable feedback. Was it all positive? No, of course not, but there were positive aspects. Was it all negative? No, of course not, but there were negative aspects. The point is, I had received critical feedback on my work from three experts in my field. What a learning experience!

That’s not to say I’ve always found rejection plain-sailing. Despite the positives to be gained from the reviews of a rejected paper, I’ve recently had a bad run of rejections which I found very difficult: seven consecutive paper rejections (not all the same paper). After a while (maybe the fourth?), it was difficult to not start questioning myself: Am I up to this job? Imposter syndrome had kicked in royally. This rejection-fest has recently ended, and I had a paper accepted the other day. This paper contains work I am most proud of to date, but imposter syndrome is still here.

It never helps my cause that I also have a very bad habit of looking at people at similar stages of their career to me and looking at their extensive CVs. I have an even worse habit of looking at professors’ CVs and trying to work out how I shape up in comparison to them when they were at my stage of career. Some of these comparisons give me hope; other comparisons leave me feeling even more incapable. (Does anyone else do this, too? If you don’t, DO NOT start doing it; what a complete waste of time and energy!)

These negative feelings are common, and the more academics I speak to the more I realise I’m not the only one who feels this way. My greatest discovery was finding out that other people have rejections, too; you can’t help but feel sometimes that you are the only one! This realisation came from when I started to review other people’s work, as reviewers get cc’d in to the decision letter (“Bloody hell, even Professor XXX gets rejections!”). I’m not the only one, it seems, and neither are you.

How do I deal with rejection?

I feel that I have a pretty broad back when it comes to rejections. Yes, I feel crap about it for a little while. But, I try not to let it dominate my thoughts. I have a pretty good routine for dealing with rejections, which I want to briefly outline below. My process is not novel, and I remember reading something similar from someone else, but for the life of me I can’t recall where I saw it.

• When I receive the decision letter, I read the editor’s comments first (obviously). This tends to be a panic-stricken scan for the word “unfortunately” rather than a comprehensive read, but I can quickly assess the damage.
• I read through the reviewers comments once. I aim to get the broad “feel” for the issues that have been raised, but at this stage I don’t focus on the details so much.
• I put the reviews away in an email folder.
• I do not look at the reviews again for at least a couple of days. This is my “licking my wounds” phase. I try to fill it with as many positive things as I can. If I’m fortunate enough to have another paper I am working on I continue working on it during this phase. For me, this is very important and serves two purposes: to take my mind off the rejection, and to make me feel like I am still progressing (doing nothing during this time has the danger of making one feel like you’ve been rejected and there’s nothing you can do about it).
• After a few days, I return to the reviews. I often realise at this stage that the reviewers actually raised some very insightful and important points. I note all of these down.
• In a revision—either back to the same journal or in revision to submit the paper elsewhere—I make sure I deal with ALL comments. This doesn’t mean I do everything a reviewer asks for, but I do make sure I can defend why I haven’t done a particular thing.
• Craft my revision letter. This often turns into a very lengthy document. I’ve had revision letters that have been just as long as the manuscript I’m revising. I outline in detail every point that every reviewer raised, and point to where in the manuscript the change is, or—if I didn’t agree with a point—I elaborate why I haven’t included it. I want to leave no doubt in the editor’s mind that I’ve thought deeply about the issues raised, and I’ve acted in an open and responsive manner. I don’t do it for brownie-points; I do it because I take the reviewers’ comments very seriously and I want to ensure I provide a comprehensive response.
• I acknowledge the work of the reviewers in my revision letter. They’ve taken the time out of their busy research schedule to assess my work and I am always grateful for that, whether they recommend acceptance or not.
• I also try to remember some sage advice: you’re only being rejected because you’ve been productive enough to submit something.

Ironically enough, as I was writing these last points I had another rejection through (a small teaching-related grant). Time to go lick my wounds and start the cycle all over again…

The benefit of a lab book

I always though that use of a lab book—a dedicated space to note down experimental methods, results etc.—was largely restricted to sciences like chemistry or physics, where one might conduct several experiments per week, and thus tracking progress is essential lest you lose your train of thought. What role do they have in psychology, where the pace of experimentation—both of the experiment itself and the time between each experiment—is arguably much more sedate?

Since January this year I’ve been keeping my own lab book, and it has boosted organisation of my thoughts and progress considerably. It’s just a simple word document, organised by themes. I have chapters titled “Experiments”,  “Models”, and “Research Ideas”. It has a table of contents, list of figures, and list of tables. It also has a references section. Its layout is very much like a thesis.

Now, in one place, I log all of my experiments in as much detail as I would in a paper submission (minus the protracted introduction & discussion). Usually I would write up only those experiments which “worked”, in that I would be preparing them for rejection submission, but it has been a revelation writing up all of my experiments. All of them were important enough to me to run; all were sufficiently powered and—in my opinion—sufficiently designed to address a question I have, so why not write them up somewhere? It’s better than letting the data rot in an electronic file drawer.

Keeping everything in one place has made me feel much more in control of my work. The enhanced feeling of organisation having one lab book is liberating. I carry a dictaphone around with me in case I have thoughts whilst in a situation I can’t write in; now, I put the audio file in dropbox and have an entry to my lab book with a link to this file. I also carry a small note book around with me; instead of keeping notes contained in this book, I can take a snapshot with my smartphone and enter the photo as a figure in my lab book.

It also gives me a feeling of accomplishment. Academia is famed for delayed rewards (if they come at all), so it is nice to be able to look back and see that I have made some progress, even if most of it will never see the light of day. Science is a cumulative process, and the steps that make this progress are small, and often made behind closed doors; published work doesn’t always reflect these small steps, so I like having a permanent record of mine.

The greatest benefit has come from organising the modelling work which I try to do. Modelling requires a lot of tedious steps, most of which get scrubbed from final reports: what did you try first that didn’t work? What tests did you do to check the model code was bug-free? Did you do parameter-recovery simulations? What about testing for model mimicry and model-recovery simulations? Each of these steps require new scripts of code, new parameters, and new data. Before now, I would continuously update one script to cover all of these stages, and the final script would be long and complex with little acknowledgement of its heritage. I would know that I’ve conducted these stages, but I wouldn’t log the results anywhere. I would just “know” that they are complete. Now, I log each of these stages—and their results—in my lab book, together with links to archived code for each stage. This makes the process much cleaner, and I feel more confident about the final product.

Open Lab Book?

One step I have not yet had the confidence to take is to go open; open lab books are those which are kept “live” on the internet, so others can see. on open lab books with links to examples from other sciences.

I see many advantages to this: notably, science is—or at least, should be—an open dialogue, so why not let others see what I’m doing? Perhaps I would get some comments/ideas that aid my research. Of course, there is the fear of being “scooped”, but I don’t think this is a fear worth entertaining (at least, not for the work I do). What puts me off doing it is it would change the way I write in my lab book. By writing just for me, I can be more economical with explanations. I can also be free to be more informal with my thoughts (“Why didn’t that bloody experiment work?”). And yes, I can also hide some of the awful ideas I have.

I prefer to use my lab book primarily as a way to organise my thoughts. With increasing demands on my time, the organisational boost it has brought has been worth its weight in gold. I’m also looking forward to looking back in 5 years’ time at all the work I have done.

Try it!

New blog – so what?

I was hesitant to start a personal blog about my academic interests. There are so many excellent psychology-related bloggers out there (Rolf Zwaan’s and Dorothy Bishop’s are among my faves); who am I to say anything of interest to potential readers when they could be reading those blogs? But then I thought, why not just write for yourself, Jim? And so here I am.

I already blog about research methods in psychology at http://www.researchutopia.wordpress.com. This is primarily aimed at undergraduate students who are sailing the turbulent seas of “god we hate statistics”. My rate of new posts is somewhat low, often hindered by the fact I find myself wanting to write about something that is either a) beyond the scope of that blog, or b)  too esoteric for a statistics blog aimed at undergraduates. Therefore, this blog will serve as my outlet for discussing things that don’t quite fit into my other one. Among these topics will be general interest items about academic papers I’ve found of interest, interesting questions I’m working on in my own research, trends in cognitive psychology, statistical issues, data analysis tidbits, bits of computer code (primarily R; for those unaware, I’m a bit of an R-addict), and so on.

I’m hoping that the blog ends up more coherent than it sounds like it will be at this stage. As all posts will be work-related (and when you’re an early-career academic, isn’t everything work-related?), the posts will reflect my path as I wander through academic life.

Every journey begins with a single step…