MAS8381: Statistics for Big Data

Stochastic simulation and Bayesian computing


Make sure everything is working

The main R script for this practical is sim1.R. To make sure everything is working, start by just running the script (pressing return when asked in order to get the next plot).

Once everything is working properly, go through the script bit by bit making sure everything makes sense.

Basic Monte Carlo

The first part of the script carries out some basic Monte Carlo simulation and analyses the results.
cat("Starting script sim1.R\n")
op=par(mfrow=c(2,2),ask=TRUE)

cat("Simulate some normal random quantities and analyse them\n")
x=rnorm(200,2,3)
print(summary(x))
cat(paste("SD",sd(x),"\n"))
hist(x,freq=FALSE,col=2)
curve(dnorm(x,2,3),-5,10,add=TRUE,col=4,lwd=2)
qqnorm(x,col=2)
qqline(x,col=4,lwd=2)
This bit of the script just simulates 200 N(2,9) random quantities (storing them in a vector called x), computes some summary statistics, and creates a couple of plots.

Importance resampling

Here some Cauchy random quantities are resampled to give an approximate standard normal random sample. Note that the sample() command automatically normalises the supplied weights.
cat("Importance resampling\n")
y=rcauchy(10000)
w=dnorm(y)/dcauchy(y)
print(mean(w))
x=sample(y,replace=TRUE,prob=w)
print(summary(x))
print(sd(x))
hist(x,20,freq=FALSE,col=2)
curve(dnorm(x),-3,3,add=TRUE,col=4,lwd=2)
qqnorm(x,col=2)
qqline(x,col=4,lwd=2)

Gibbs sampler

This part of the script runs the normgibbs Gibbs sampler from the smfsb R package.
cat("Gibbs sampler\n")
require(smfsb)
# example(normgibbs)
x=rnorm(15,25,2)
postmat=normgibbs(N=10100,n=length(x),a=3,b=11,cc=10,d=1/100,xbar=mean(x),ssquared=var(x))
colnames(postmat)=c("mu","tau")
postmat=postmat[101:dim(postmat)[1],]
mcmcSummary(postmat,rows=2)
Be sure to inspect the normgibbs() function by entering normgibbs at the R command prompt. Also get documentation on it by entering ?normgibbs and run the example by entering example(normgibbs). Do the same for the mcmcSummary() function, which is also part of the smfsb package.

Metropolis MCMC sampling

The next part of the script runs a Metropolis sampler using the metrop() function from the smfsb package. Inspect the metrop() function and read its associated documentation before proceding further.
cat("Metropolis sampler\n")
# demo(Metropolis)
alpha=10^(-1:2)
mat=sapply(alpha,function(alpha){metrop(10000,alpha)})
colnames(mat)=paste("alpha =",alpha)
mcmcSummary(mat)

CODA

The mcmcSummary() function from the smfsb package is good for having a quick look at MCMC output, but for more sophisticated analysis and diagnostics, it is worth learning how to use the coda package.
cat("coda\n")
require(coda)
codamat=mcmc(postmat)
print(summary(codamat))
plot(codamat)
geweke.diag(codamat)
geweke.plot(codamat)
cat("Effective sample size:\n")
print(effectiveSize(codamat))
# codamenu()
Get help on the package in the usual R way, starting with help(package="coda"). Note that codamenu() allows interactive analysis and diagnostics of a coda mcmc object.

Exercises

1. Derive a Metroplis random walk algorithm for sampling a Ga(3,2) density using N(0,1) innovations. Code it up in R and verify that it works correctly.
2. Modify the code from 1. to use N(0,eps) innovations. Find the value of eps which minimises the lag one autocorrelation in the chain.

Darren J Wilkinson