Correlations are a popular analysis tool in psychology to examine the extent to which two variables are related. For example, one might be interested in whether there is a relationship between shoe size and height. But what if you want to explore the relationship between two variables whilst controlling for the effects of a third variable? For example, you might be interested in the relationship between shoe size and height whilst controlling for age. In order to do this, one needs to use partial correlations.
Whilst there are relatively straightforward methods to calculate partial correlations using frequentist statistics, I was interested recently whether there is a Bayesian parameter estimation version of this analysis. A cursory glance of the internet didn’t bring up much^{1}, so I set about developing my own.
Below, I briefly cover the frequentist implementation before providing an overview of the Bayesian parameter estimation version.
Frequentist Implementation
Assume we have three variables, X, Y, and Z. We are interested in the relationship between X and Y, but need to account for the effect of Z. I simulated 75 data points (these could be participants in a study), and calculation of a standard Pearson’s correlation between X and Y provides r_{x,y}(74) = 0.48, p<0.05. A “significant” relationship! However, this doesn’t take into account the effect of Z.
Examination of all possible correlations between X, Y, and Z reveals a strong relationship between Z and the other two variables:
- r_{x,z}(74) = 0.58, p<0.05.
- r_{y,z}(74) = 0.62, p<0.05.
Examining the relationship between X and Y whilst controlling for Z (denoted here as *r_{x,y|z}*) is given by the following formula:
When calculating this partial correlation between X and Y, controlling for Z, we get r_{x,y|z}(74) = 0.19, p = .10. Note now that the relationship between X and Y is no longer “significant”. The Figure below shows the relationship with Regular (i.e., Pearson’s) and Partial correlation.
Bayesian Parameter Estimation
This is all well and good, but what if you want a Bayesian estimate of the partial correlation parameter? I had a quick check of the internet, and couldn’t see anything. (I should say that I didn’t spend long looking; I like to do some things from scratch just so I can learn something, and this was one of those times.) Inspired by the SUPERB book by Lee & Wagenmakers on Bayesian modelling, I devised a Bayesian graphical model of partial correlation:
This model extended the one presented in the Lee & Wagenmakers book which was used to calculate the correlation coefficient between just two variables. There are two extensions in the current model over that of Lee & Wagenmakers:
- The model is extended to accommodate three variables of interest instead of two. As such, three correlation parameters are estimated from the data: r_{x,y}, r_{x,z}, and r_{y,z}.^{2}
- The model has a new parameter, , which denotes the partial correlation of interest in the current example, that is r_{x,y|z}
The model assumes—as does the one by Lee & Wagenmakers—that the data are modelled as draws from a multivariate normal distribution. The parameters of this distribution are the means of each of the three variables (denoted ) and their standard deviations (), as well as the correlation coefficients that link them (). I use broad, rather uninformative priors (which of course can be tweaked later if one wishes).
The parameter of interest, , inherits the distributions from the model parameters pertaining to the correlation coefficients (), and from them generates a new distribution of the partial correlation parameter (according to the first equation in this post). The distribution of interest for inference is now this new distribution pertaining to the partial correlation parameter.
Results
Recall that the frequentist estimate of the partial correlation coefficient was r_{x,y|z}(74) = 0.19. Below is a density plot of the posterior distribution of the parameter from the Bayesian graphical model above^{3}.
The mode of this posterior distribution was 0.205, with a 95% Highest-Density Interval spanning from -0.03 to 0.39. Note that whilst the modal estimate of the partial correlation parameter was close to that of the frequentist analysis, the Bayesian parameter estimation provides much more information, in particular regarding the uncertainty of this estimate.
Conclusion
I am not sure if this implementation is correct, so I advise using it with caution. But, it proved an interesting exercise, and I am not aware of a current implementation of this.
Code
#------------------------------------------------------------------------------ ### initial set up rm(list = ls()) # set working directory setwd("D:/Work/Other/Blog_YouTube code/Blog/Bayesian Partial Correlation") # load libraries library(ppcor) library(Hmisc) library(R2jags) # set seed for reproducible code set.seed(42) #------------------------------------------------------------------------------ #------------------------------------------------------------------------------ ### define functions # function to simulate partially correlated data get_data <- function(n){ x <- rnorm(n, 0, 1) y <- .5 * x + rnorm(n, 0, 1) z <- .3 * x + .6 * y + rnorm(n, 0, 1) data <- data.frame(x = x, y = y, z = z) return(data) } #------------------------------------------------------------------------------ #------------------------------------------------------------------------------ ### get data & conduct frequentist analysis # generate the data n <- 75 x <- get_data(n) ## plot the data with linear model fits op = par(cex.main = 1.5, mar = c(5,6,4,5) + 0.1, mgp = c(3.5, 1,0), cex.lab = 1.5, font.lab = 2, cex.axis = 1.3, bty = "n", las = 1) # do pairs plot pdf("pairs.pdf", width = 6, height = 6) pairs(x, upper.panel = NULL, pch = 19) dev.off() # do correlation plot pdf("correlation.pdf", width = 6, height = 6) plot(x$y, x$x, pch = 17, ylab = "", xlab = "") # model not controlling for z mod_1 <- lm(x$x ~ x$y) abline(a = mod_1$coefficients[1], b = mod_1$coefficients[2], lwd = 3, lty = 1, col = "red") # model controlling for z mod_2 <- lm(x$x ~ x$y + x$z) abline(a = mod_2$coefficients[1], b = mod_2$coefficients[2], lwd = 3, lty = 1, col = "blue") legend("bottomright", c("Regular", "Partial"), lty = c(1, 1), bty = "n", lwd = 3, col = c("red","blue"), cex = 1.5) dev.off() # get the frequentist estimate of correlation & partial correlation # note that the correlation between x and y is no longer significant when # controlling for z freq_r <- rcorr(as.matrix(x)) freq_partial_r <- pcor(x) #------------------------------------------------------------------------------ #------------------------------------------------------------------------------ ### Conduct Bayesian parameter estimation # declare the JAGS model code model_code <- " model { # data for(i in 1:n){ x[i, 1:3] ~ dmnorm.vcov(mu[], TI[,]) } # priors mu[1] ~ dnorm(0, .001) mu[2] ~ dnorm(0, .001) mu[3] ~ dnorm(0, .001) lambda[1] ~ dgamma(.001, .001) lambda[2] ~ dgamma(.001, .001) lambda[3] ~ dgamma(.001, .001) r_xy ~ dunif(-1, 1) r_xz ~ dunif(-1, 1) r_yz ~ dunif(-1, 1) # reparameterisation sigma[1] <- 1/sqrt(lambda[1]) sigma[2] <- 1/sqrt(lambda[2]) sigma[3] <- 1/sqrt(lambda[3]) T[1, 1] <- 1 / lambda[1] T[1, 2] <- r_xy * sigma[1] * sigma[2] T[1, 3] <- r_xz * sigma[1] * sigma[3] T[2, 1] <- r_xy * sigma[1] * sigma[2] T[2, 2] <- 1 / lambda[2] T[2, 3] <- r_yz * sigma[2] * sigma[3] T[3, 1] <- r_xz * sigma[1] * sigma[3] T[3, 2] <- r_yz * sigma[2] * sigma[3] T[3, 3] <- 1 / lambda[3] TI[1:3, 1:3] <- T[1:3, 1:3] # partial correlation calculation num <- r_xy - (r_xz * r_yz) denom <- sqrt(1 - pow(r_xz, 2)) * sqrt(1 - pow(r_yz, 2)) partial <- num/denom } " # model details jags_info <- list("x", "n") parameters <- c("r_xy", "r_xz", "r_yz", "mu", "sigma", "partial") # fit the model sample <- jags(jags_info, inits = NULL, parameters, model.file = textConnection(model_code), n.chains = 1, n.iter = 10000, n.burnin = 500, n.thin = 5, DIC = F) # look at the overview of the parameter estimates sample # extract the posterior samples of the partial correlation (4th column) & # calculate the 95% HDI posterior <- sample$BUGSoutput$sims.matrix[, 4] sample_mcmc <- as.mcmc(posterior) hdi <- HPDinterval(sample_mcmc) ### plot # do some preparation for plotting by finding the mode of the posterior dens <- density(posterior) posterior_mode <- dens$x[which.max(dens$y)] # do the plot pdf("bayesian_estimate.pdf", width = 6, height = 6) plot(dens, xlim = c(-1, 1), ylim = c(0, max(dens$y) + 0.55), main = "", xlab = "Partial Correlation Estimate", lwd = 2) # add the mode of the sample & the HDI etc. lines(x=c(posterior_mode, posterior_mode), y=c(2.5, 3.8), lty = 2, lwd = 2, col = "red") text(x= posterior_mode, y = max(dens$y) + 0.5, paste("Posterior mode =", round(posterior_mode, 3), sep = " "), cex = 1.2, col = "red") lines(x = c(hdi[1],hdi[1]), y = c(0,0.2), lwd = 2, col = "red") lines(x = c(hdi[2],hdi[2]), y = c(0,0.2), lwd = 2, col = "red") lines(x = c(hdi[1],hdi[2]), y = c(0.1,0.1), lwd = 2, col = "red") text(x = (hdi[1] + hdi[2]) / 2, y = 0.325, "95% HDI", cex = 1.2, col = "red") dev.off() #------------------------------------------------------------------------------
- Note that there is a paper from Wetzel & Wangenmakers (2012) which demonstrates the calculation of Bayes factors for correlation and partial correlation using summary statistics (i.e., Pearson’s r and the degrees of freedom). Note also (as I say in the main post) that I didn’t search that hard for a solution to this problem as I was keen to make my own method. So, there is probably a better way of doing this that is already available. ↩
- Note that in the model of Lee & Wagenmakers with just two variables, one must take the inverse of the variance–covariance matrix when passing it to the dmnorm function in JAGS. With three or more variables, there are issues when taking the inverse of the matrix because it is then not positive definite. (See this link for the help I received from the JAGS owner on this issue) The solution involves using JAGS versions newer than 4.3.0, and using the dmnorm.cov function instead. ↩
- As this is just a toy example, I kept the model-fit time as quick as possible. I generated 10,000 samples from the posterior distributions, treating the first 500 as burn-in samples. The thinning rate was set to 5, and just one chain was used. Note that this is not optimal, but the fit time was quite slow for the model. ↩