MAS8381: Statistics for Big Data

One-way ANOVA with fixed and random effects

In this practical we will analyse a simple one-way ANOVA model from a Bayesian perspective, making sure that we understand the difference between fixed and random effects in a Bayesian context where everything is random, as well as the associated identifiability issues.

We will consider the body mass index (BMI) of new male undergraduate students at a selection of UK Universities. Let us suppose that our data consist of measurements of (log) BMI for a random sample of 1,000 males at each of 8 Universities. We are interested to know if there are any differences between the Universities. Again, we want to model the process as we would simulate it, so thinking about how we would simulate such data is instructive. We start by assuming that the log BMI is a normal random quantity, and that the variance is common across the Universities in question (this is quite a big assumption, and it is easy to relax). We assume that the mean of this normal distribution is University-specific, but that we do not have strong prior opinions regarding the way in which the Universities differ. That said, we expect that the Universities would not be very different from one another.

Simulating data

A simple simulation of the data with some plausible parameters can be carried out as follows.

set.seed(1)
Z=matrix(rnorm(1000*8,3.1,0.1),nrow=8)
RE=rnorm(8,0,0.01)
X=t(Z+RE)
colnames(X)=paste("Uni",1:8,sep="")
Data=stack(data.frame(X))
boxplot(exp(values)~ind,data=Data,notch=TRUE)

Make sure that you understand exactly what this code is doing before proceeding.

Frequentist analysis

We will start with a frequentist analysis of the data. The model we would like to fit is

Y[i,j] = mu + theta[i] + eps[i,j]

where i is an indicator for the University and j for the individual with a particular University. We know that this is not actually identifiable when the model parameters are all treated as "fixed effects", but R will handle this for us.

mod=lm(values~ind,data=Data)
summary(mod)

We see that R has handled the identifiability problem using "treatment contrasts", dropping the fixed effect for the first university, so that the intercept actually represents the mean value for the first University, and the effects for the other Univeristies represent the differences from the first University. If we would prefer to impose a sum constraint, then we can switch to sum contrasts with

options(contrasts=rep("contr.sum",2))

and then re-fit the model.

mods=lm(values~ind,data=Data)
summary(mods)

This has 7 degrees of freedom for the effects, as before, but ensures that the 8 effects sum to precisely zero. This is arguably more interpretable in this case.

Bayesian analysis

We will now analyse the simulated data from a Bayesian perspective, using JAGS.

Fixed effects

All parameters in Bayesian models are uncertain, and therefore random, so there is much confusion regarding the difference between "fixed" and "random" effects in a Bayesian context. For "fixed" effects, our prior captures the idea that we sample the effects independently from a "fixed" (typically vague) prior distribution. We could simply code this up and fit it in JAGS as follows.

require(rjags)
n=dim(X)[1]
p=dim(X)[2]
data=list(X=X,n=n,p=p)
init=list(mu=2,tau=1)
modelstring="
  model {
    for (j in 1:p) {
      theta[j]~dnorm(0,0.0001)
      for (i in 1:n) {
        X[i,j]~dnorm(mu+theta[j],tau)
      }
    }
    mu~dnorm(0,0.0001)
    tau~dgamma(1,0.0001)
  }
"
model=jags.model(textConnection(modelstring),data=data,inits=init)
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("mu","tau","theta"),n.iter=100000,thin=10)
print(summary(output))
plot(output)
autocorr.plot(output)
pairs(as.matrix(output))
crosscorr.plot(output)

We can clearly see that this naive approach leads to high posterior correlation between the mean and the effects, due to the fundamental lack of identifiability of the model. This also leads to MCMC mixing problems, but it is important to understand that this computational issue is conceptually entirely separate from the fundamental statisticial identifiability issue. Even if we could avoid MCMC entirely, the identifiability issue would remain.

A quick fix for the identifiability issue is to use "treatment contrasts", just as for the frequentist model. We can implement that as follows.

data=list(X=X,n=n,p=p)
init=list(mu=2,tau=1)
modelstring="
  model {
    for (j in 1:p) {
      for (i in 1:n) {
        X[i,j]~dnorm(mu+theta[j],tau)
      }
    }
    theta[1]<-0
    for (j in 2:p) {
      theta[j]~dnorm(0,0.0001)
    }
    mu~dnorm(0,0.0001)
    tau~dgamma(1,0.0001)
  }
"
model=jags.model(textConnection(modelstring),data=data,inits=init)
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("mu","tau","theta"),n.iter=100000,thin=10)
print(summary(output))
plot(output)
autocorr.plot(output)
pairs(as.matrix(output))
crosscorr.plot(output)

Running this we see that the model now works perfectly well, mixes nicely, and gives sensible inferences for the treatment effects.

Another source of confusion for models of this type is data formating and indexing in JAGS models. For our balanced data there was not problem passing in data to JAGS as a matrix and specifying the model using nested loops. However, for unbalanced designs this is not necessarily so convenient, and so then it can be helpful to specify the model based on two-column data, as we would use for fitting using lm(). This is illustrated with the following model specification, which is exactly equivalent to the previous model, and should give identical (up to Monte Carlo error) results.

N=n*p
data=list(y=Data$values,g=Data$ind,N=N,p=p)
init=list(mu=2,tau=1)
modelstring="
  model {
    for (i in 1:N) {
      y[i]~dnorm(mu+theta[g[i]],tau)
    }
    theta[1]<-0
    for (j in 2:p) {
      theta[j]~dnorm(0,0.0001)
    }
    mu~dnorm(0,0.0001)
    tau~dgamma(1,0.0001)
  }
"
model=jags.model(textConnection(modelstring),data=data,inits=init)
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("mu","tau","theta"),n.iter=100000,thin=10)
print(summary(output))
plot(output)

As suggested above, this indexing scheme is much more convenient for unbalanced data, and hence widely used. However, since our data is balanced here, we will revert to the matrix approach for the remainder of the post.

One final thing to consider before moving on to random effects is the sum-contrast model. We can implement this in various ways, but I've tried to encode it for maximum clarity below, imposing the sum-to-zero constraint via the final effect.

data=list(X=X,n=n,p=p)
init=list(mu=2,tau=1)
modelstring="
  model {
    for (j in 1:p) {
      for (i in 1:n) {
        X[i,j]~dnorm(mu+theta[j],tau)
      }
    }
    for (j in 1:(p-1)) {
      theta[j]~dnorm(0,0.0001)
    }
    theta[p] <- -sum(theta[1:(p-1)])
    mu~dnorm(0,0.0001)
    tau~dgamma(1,0.0001)
  }
"
model=jags.model(textConnection(modelstring),data=data,inits=init)
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("mu","tau","theta"),n.iter=100000,thin=10)
print(summary(output))
plot(output)

Again, this works perfectly well and gives similar results to the frequentist analysis.

Random effects

The key difference between fixed and random effects in a Bayesian framework is that random effects are not independent, being drawn from a distribution with parameters which are not fixed. Essentially, there is another level of hierarchy involved in the specification of the random effects. This is best illustrated by example. A random effects model for this problem is given below.

data=list(X=X,n=n,p=p)
init=list(mu=2,tau=1)
modelstring="
  model {
    for (j in 1:p) {
      theta[j]~dnorm(0,taut)
      for (i in 1:n) {
        X[i,j]~dnorm(mu+theta[j],tau)
      }
    }
    mu~dnorm(0,0.0001)
    tau~dgamma(1,0.0001)
    taut~dgamma(1,0.0001)
  }
"
model=jags.model(textConnection(modelstring),data=data,inits=init)
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("mu","tau","taut","theta"),n.iter=100000,thin=10)
print(summary(output))
plot(output)

The only difference between this and our first naive attempt at a Bayesian fixed effects model is that we have put a gamma prior on the precision of the effect. Note that this model now runs and fits perfectly well, with reasonable mixing, and gives sensible parameter inferences. Although the effects here are not constrained to sum-to-zero, like in the case of sum contrasts for a fixed effects model, the prior encourages shrinkage towards zero, and so the random effect distribution can be thought of as a kind of soft version of a hard sum-to-zero constraint.

Since we have used simulated data here, we can compare the estimated random effects with the true effects generated during the simulation.

apply(as.matrix(output),2,mean)
RE

We see that the Bayesian random effects model has done an excellent job of estimation. If we wished, we could relax the assumption of common variance across the groups by making tau a vector indexed by j, though there is not much point in persuing this here, since we know that the groups do all have the same variance.

Strong subjective priors

The above is the usual story regarding fixed and random effects in Bayesian inference. I hope this is reasonably clear, so really I should quit while I'm ahead... However, the issues are really a bit more subtle than I've suggested. The inferred precision of the random effects was around 7,000, so now lets re-run the original, naive, "fixed effects" model with a strong subjective Bayesian prior on the distribution of the effects.

data=list(X=X,n=n,p=p)
init=list(mu=2,tau=1)
modelstring="
  model {
    for (j in 1:p) {
      theta[j]~dnorm(0,7000)
      for (i in 1:n) {
        X[i,j]~dnorm(mu+theta[j],tau)
      }
    }
    mu~dnorm(0,0.0001)
    tau~dgamma(1,0.0001)
  }
"
model=jags.model(textConnection(modelstring),data=data,inits=init)
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("mu","tau","theta"),n.iter=100000,thin=10)
print(summary(output))
plot(output)

This model also runs perfectly well and gives sensible inferences, despite the fact that the effects are iid from a fixed distribution and there is no hard constraint on the effects. So it isn't so much the fact that the effects are coupled via an extra level of hierarchy that makes things work. It's really the fact that the effects are sensibly distributed and not just sampled directly from a vague prior. So for "real" subjective Bayesians the line between fixed and random effects is actually very blurred indeed...

Exercise

Adapt the Bayesian random effects model to allow the variance of each group to be different. Verify that the model continues to give sensible inferences, and that the inferred group precisions are all very similar and consistent with the true known precision.

Darren J Wilkinson