Loading [MathJax]/extensions/TeX/boldsymbol.js

Stochastic simulator diagnostics

Colin Gillespie

Stochastic kinetic models

Suppose we have

  • N species: X_1, X_2, \ldots, X_N
  • M reactions: R_1, R_2, \ldots, R_M
  • In a typical model, M = 2N

Reaction R_i takes the form \begin{array}{cccc} R_{i}: \, & u_{i1} X_1+\ldots +u_{ik} X_N &\xrightarrow {\phantom{a}c_{i}\phantom{a}} & v_{i1}X_1+\ldots +v_{ik}X_N \end{array} The effect of reaction i on species j is to change X_j by an amount v_{ij} - u_{ij}

  • Straightforward to simulate

Direct method

  1. Initialisation: initial conditions, reactions constants, and random number generators
  2. Propensities update: Update each of the M hazard functions, h_i(x)
  3. Propensities total: Calculate the total hazard h_0 = \sum_{i=1}^M h_i(x)
  4. Reaction time: \tau = -ln[U(0,1)]/h_0 and t = t+ \tau
  5. Reaction selection: A reaction is chosen proportional to it's hazard
  6. Reaction execution: Update species
  7. Iteration: If the simulation time is exceeded stop, otherwise go back to step 2

Typically there are a large number of iterates

Simulator slow down

  • As the number of reactions (and species) increase, the length of time a single iteration takes also increases
  • Clever tricks help
    • Dynamic sorting of the reactions
    • Updating only species and hazards that have changed
    • Avoid simulating two uniform random numbers per iteration
  • For models where one of the hazards is large, simulation time can be prohibitively large

Approximate simulators

Relax some assumptions (e.g. discreteness and stochasticity) in order to make simulation faster and more scalable

  • Diffusion approximation / chemical Langevin equation (CLE)
  • Hybrid discrete-continuous models
  • Linear noise approximation (LNA) / moment closure (2MA)

How do can we assess the appropriateness of these methods?

Moment based approximations

  • Moment closure and the linear noise approximation
  • Obtain (differential) equations for the mean and covariance
  • Typically assume normality
  • Very useful for parameter inference

Dimerisation system

2X_1 \longrightarrow X_2 \quad \mbox{and} \quad X_2 \longrightarrow 2 X_1 with propensities h_1(X_1, X_2) = 0.5 k_1 X_1(X_1-1) and h_2(X_2) = k_2 X_2

Moment equations

The equation for the mean ODE is \frac{dE[X_1]}{dt} = 0.5 k_1 (E[X_1^2] - E[X_1]) -k_2 E[X_1] and second moment \frac{dE[X_1^2]}{dt} = k_1 (E[X_1^2X_2] - E[X_1 X_2]) + 0.5 k_1(E[X_1^2] - E[X_1]) k_2 (E[X_1] - 2 E[X_1^2]) where E[X_1] is the mean of X_1 and E[X_1^2] - E[X_1]^2 is the variance

Closing the system

Typically use Gaussian or Log-normal, since it's easy to generalise to the multivariate case

Distribution Moments Cumulants
Gaussian \mu_3 = 3 \mu_2 \mu_1 - 2 \mu_1^3 \kappa_3=0
Poisson \mu_3 = \mu_1+3\mu_1^2 + \mu_1^3 \kappa_3=\kappa_2=\kappa_1
Log-Normal \mu_3 = (\mu_2/\mu_1)^3 \kappa_3=(\kappa_2/\kappa_1)^3 + 3\kappa_2^2/\kappa_1

Rational rate laws

If we consider a slightly more complicated propensity function X \rightarrow \emptyset with propensity function h(X) = \frac{k_1 X^2}{k_2 + X^2} we get the moment equation \frac{dE(X)}{ dt} =\frac{1}{2k_2}\left[-4k_1 E(X^3) - (2k_2+1)\frac{d E(X^2)}{dt} - 2\frac{d E(X^3)}{dt} - \frac{d E(X^4)}{dt}\right].

Why use MC/LN approximations?

  • Model building
    • a quick method for estimating the mean/variance of system
  • Steady-state analysis
    • set the differentials to zero
  • Parameter inference
    • instead of multiple forward simulations (per MCMC iteration), solve a set of ODEs
    • Lotka-Volterra model - 10^4 iterations in less than 2 seconds
  • Experimental design
    • Each iteration of the algorithm reduces to the parameter inference problem

Example: Heat shock model (Proctor et al, 2005)

  • Stochastic kinetic model of the heat shock system
    • twenty-three reactions
    • seventeen chemical species
  • A single stochastic simulation up to t=2000 takes about 35 minutes.
  • If we convert the model to moment equations, we get 139 equations

Example: Heat shock model

Issues

  • When modelling biological systems, it's unusual to have detailed knowledge about all rate constants
    • Vague parameters over many orders of magnitude
    • Uncertain initial conditions
    • Unobserved species
  • How can we assess the appropriateness of the MC/LN approximation?

P53-Mdm2 oscillation model

  • Proctor and Grey, 2008
    • 16 chemical species
    • Around a dozen reactions
  • The model contains a single event
    • At t = 1, set X = 0
  • If we convert the model to moment equations, we get 139 equations.
0100,0000100200300TimeX(t)

SIR model

  • Immigration: \emptyset \longrightarrow X + 1 and \emptyset \xrightarrow{k_2 sf} Y + 1
  • Death: X \xrightarrow{k_3 sf} \emptyset and Y \longrightarrow \emptyset
  • Interaction: X +Y \longrightarrow 2 Y

Assessing the moment closure approximations

  • Accuracy of the mean and variance estimates
    • For any system with a second-order reaction, we can prove that the MC/LN approximations are not exact
  • Suitability of the normality assumption
    • Continuous distribution for a discrete state space
  • The question is, are the approximations good enough?

Assessing moment accuracy

  • If the system contains only zero and first order reactions, the MC/LN moment estimates are exact
  • Intuitively, if the MC/LN produce different moment estimates, this suggests an issue with the approximation
    • Note - if the MC/LN approximations agree, this does not imply accuracy
  • For a particular set of parameters, \gamma_i = (c_1, \ldots, c_L), calculate the standardised difference D_{j}^{LM}(\mathbf{\gamma}_i) = \frac{\psi_j^l(\mathbf{\gamma}_i) - \psi_j^m(\mathbf{\gamma}_i)}{\sqrt{\Sigma_{jj}^l(\mathbf{\gamma}_i)}} where
    • \psi_j(\mathbf{\gamma}_i) is the solution to the equation for species j for parameters \gamma_i using the LN or MC approximation
    • \Sigma_{jj} (\mathbf{\gamma}_i) is the variance estimate
  • Large values of D_j^{LM} indicate a potential issues with the approximation

Lotka Volterra system

  • Simple system with non-linear dynamics X \rightarrow X + 1, \quad X + Y \rightarrow 2Y \quad \text{and} \quad Y \rightarrow \emptyset
  • Under certain parameter combinations, deterministic model predicts sustained oscillations
    • But the stochastic system can become extinct
  • Set-up:
    • Fix c_3=0.3 and X(0) = Y(0)=100
    • Maximum simulation time t=30
  • Assess the moment closure and the linear noise approximations over the parameter ranges c_1 \sim (10^{-6}, 10^0) \quad \text{and} \quad c_2 \sim (10^{-6}, 10^0) on the log scale

LV moment assessment

  • Form a Latin hypercube on the log scale
  • At each point, \gamma_i on the hyper cube calculate the standardised difference D_{j}^{LM}(\mathbf{\gamma}_i)
    • Values of D_{j}^{LM}(\mathbf{\gamma}_i) > 5 are shown as red triangles

Assessing the normality

  • We can borrow ideas from Gaussian process assessment
  • At each point, \gamma_i, on the design space
    • estimate the mean, \psi, and co-variance, \Sigma, using the LN/MC approximations
    • simulate using an exact simulator
  • Prediction error for species j,
    e_{i,j} = y_j^*(\mathbf{\gamma}_i) - \psi_j^{l}(\mathbf{\gamma}_i)
  • Easier to work with the standardised prediction errors e_{i,j}^* = \frac{y_j^*(\mathbf{\gamma}_i) - \psi_j^l(\mathbf{\gamma}_i)}{\sqrt{\Sigma_{jj}^l(\mathbf{\gamma}_i)}} \;.
  • If the assumptions are correct, then e_{i,j}^* \sim N(0, 1)

Interval diagnostics

  • Construct 100\alpha% confidence interval using the approximate mean \psi and variance, \Sigma
  • Denote a particular confidence region at design point i, for species j, as CI_{i,j} (\alpha)
  • The proportion of simulated values that land within the confidence region is given by D_j^{CR} = \frac{1}{n_d} \sum_{i=1}^{n_d} \mathbf{1}[y_j^*(\mathbf{\gamma}_i) \in CI_{i,j}(\alpha)], where \mathbf{1}[\cdot] is the indicator function. The value of D_{j}^{CR} should be approximately equal to \alpha

Schlogl model

  • A well known test system that exhibits bi-modal and non-linear characteristics at certain parameter combinations
  • The system contains four reactions 2X_1 + X_2 \xrightarrow{\phantom{a}c_{1}\phantom{a}} 3X_1, \quad 3X_1 \xrightarrow{\phantom{a}c_{2}\phantom{a}} 2X_1 + X_2, \quad X_3 \xrightarrow{\phantom{a}c_{3}\phantom{a}} X_1 \\ X_1 \xrightarrow{\phantom{a}c_{4}\phantom{a}} X_3 where the distribution of X_1 is bi-modal, the Gaussian assumption would clearly be inappropriate
  • Parameters \{c_1, c_2\} are fixed at \{3\times 10^{-7}, 10^{-4}\} and the initial conditions are held constant at \{X_1(0), X_2(0), X_3(0)\} = \{250, 10^5, 2\times 10^5\} \;.
  • Two dimensional hypercube size n_d=1,000 on the \log_{10} space for the parameters \{c_3, c_4\}
  • At each of the n_d points on the hypercube, we simulate using the direct method and the LNA to a maximum time of t=5.

Schlogl model: diagnostics

  • Latin hyper-cube
  • Standardised errors
  • Q-Q plot
  • X_1 vs e_i^*
0.00.20.40.60.81.00.00.10.20.30.40.50.60.70.80.91.0

Schlogl model: simulations

  • Investigate large standardised errors more carefully 050200400600800TimeX(t)

Summary

  • Moment closure/linear approximations provide a quick, easy to construct approximation to the system of interest
    • Intelligent emulator
  • Care needs to be taken to assess the suitability of the approximation
  • Assessing the accuracy of other approximate simulators is tricky (arXiv:1409.1096)

References

  • Gillespie, CS. Moment closure approximations for mass-action models. IET Systems Biology 2009
  • Milner, P, Gillespie, CS and Wilkinson, DJ. Moment closure approximations for stochastic kinetic models with rational rate laws. Mathematical Biosciences 2011
  • Gillespie, C.S. Diagnostics for assessing the accuracy of approximate stochastic simulators, arXiv:1409.1096