MAS8381: Statistics for Big Data

Non-Gaussian state-space model


The model here is for the number of hits on a certain website in a sequence of one minute intervals. The number of hits in each minute is assumed to be Poisson, and the log of the Poisson intensity is assumed to vary slowly over time according to a linear Gaussian AR(1) process.

# simulate some data
n=1000
x=numeric(n+1)
y=numeric(n)
x[1]=rnorm(1,0,10)
for (i in 1:n) {
  x[i+1]=2+0.9*(x[i]-2)+rnorm(1,0,3)
  y[i]=rpois(1,exp(x[i+1]))
}


# fit the model
require(rjags)
data=list(y=y,n=n)
init=list(x=rep(0,n+1),mu=0,alpha=0,tau=1)
modelS="
  model {
    x[1]~dnorm(0,0.0001)
    for (i in 1:n) {
      x[i+1]~dnorm(mu+alpha*(x[i]-mu),tau)
      y[i]~dpois(exp(x[i+1]))
    }
    mu~dnorm(0,0.001)
    alpha~dnorm(0,0.0001)
    tau~dgamma(1,0.001)
  }
"
model=jags.model(textConnection(modelS),data=data,inits=init)
update(model,n.iter=1000)
output=coda.samples(model=model,variable.names=c("alpha","mu","tau"),n.iter=10000,thin=1)
print(summary(output))
plot(output)
The code first simulates some data and then fits the model to the simulated data using JAGS. Run it and make sure it works, and then go through it line by line to ensure that you understand it.

Exercises

Investigate identifiability issues in this model.
  • 1. For the parameters used to simulate the data, how many observations are needed in order to identify the model parameters to a high degree of precision?
  • 2. How is this affected by the parameters of the AR(1) process used to simulate the data? eg. Do you need more or less data if the mean of the AR(1) process is decreased? What if the auto-correlation parameter is increased? Note that the auto-correlation parameter should not exceed 1.

  • Darren J Wilkinson