## Stochastic kinetic models

Suppose we have

• $$N$$ species: $$X_1, X_2, \ldots, X_N$$
• $$M$$ reactions: $$R_1, R_2, \ldots, R_M$$
• In a typical model, $$M = 2N$$

Reaction $$R_i$$ takes the form $\begin{array}{cccc} R_{i}: \, & u_{i1} X_1+\ldots +u_{ik} X_N &\xrightarrow {\phantom{a}c_{i}\phantom{a}} & v_{i1}X_1+\ldots +v_{ik}X_N \end{array}$ The effect of reaction $$i$$ on species $$j$$ is to change $$X_j$$ by an amount $$v_{ij} - u_{ij}$$

• Straightforward to simulate

## Direct method

1. Initialisation: initial conditions, reactions constants, and random number generators
2. Propensities update: Update each of the $$M$$ hazard functions, $$h_i(x)$$
3. Propensities total: Calculate the total hazard $$h_0 = \sum_{i=1}^M h_i(x)$$
4. Reaction time: $$\tau = -ln[U(0,1)]/h_0$$ and $$t = t+ \tau$$
5. Reaction selection: A reaction is chosen proportional to it's hazard
6. Reaction execution: Update species
7. Iteration: If the simulation time is exceeded stop, otherwise go back to step 2

Typically there are a large number of iterates

## Simulator slow down

• As the number of reactions (and species) increase, the length of time a single iteration takes also increases
• Clever tricks help
• Dynamic sorting of the reactions
• Updating only species and hazards that have changed
• Avoid simulating two uniform random numbers per iteration
• For models where one of the hazards is large, simulation time can be prohibitively large

## Approximate simulators

Relax some assumptions (e.g. discreteness and stochasticity) in order to make simulation faster and more scalable

• Diffusion approximation / chemical Langevin equation (CLE)
• Hybrid discrete-continuous models
• Linear noise approximation (LNA) / moment closure (2MA)

How do can we assess the appropriateness of these methods?

## Moment based approximations

• Moment closure and the linear noise approximation
• Obtain (differential) equations for the mean and covariance
• Typically assume normality
• Very useful for parameter inference

## Dimerisation system

$2X_1 \longrightarrow X_2 \quad \mbox{and} \quad X_2 \longrightarrow 2 X_1$ with propensities $$h_1(X_1, X_2) = 0.5 k_1 X_1(X_1-1)$$ and $$h_2(X_2) = k_2 X_2$$

### Moment equations

The equation for the mean ODE is $\frac{dE[X_1]}{dt} = 0.5 k_1 (E[X_1^2] - E[X_1]) -k_2 E[X_1]$ and second moment $\frac{dE[X_1^2]}{dt} = k_1 (E[X_1^2X_2] - E[X_1 X_2]) + 0.5 k_1(E[X_1^2] - E[X_1]) k_2 (E[X_1] - 2 E[X_1^2])$ where $$E[X_1]$$ is the mean of $$X_1$$ and $$E[X_1^2] - E[X_1]^2$$ is the variance

## Closing the system

Typically use Gaussian or Log-normal, since it's easy to generalise to the multivariate case

Distribution Moments Cumulants
Gaussian $$\mu_3 = 3 \mu_2 \mu_1 - 2 \mu_1^3$$ $$\kappa_3=0$$
Poisson $$\mu_3 = \mu_1+3\mu_1^2 + \mu_1^3$$ $$\kappa_3=\kappa_2=\kappa_1$$
Log-Normal $$\mu_3 = (\mu_2/\mu_1)^3$$ $$\kappa_3=(\kappa_2/\kappa_1)^3 + 3\kappa_2^2/\kappa_1$$

## Rational rate laws

If we consider a slightly more complicated propensity function $X \rightarrow \emptyset$ with propensity function $h(X) = \frac{k_1 X^2}{k_2 + X^2}$ we get the moment equation $\frac{dE(X)}{ dt} =\frac{1}{2k_2}\left[-4k_1 E(X^3) - (2k_2+1)\frac{d E(X^2)}{dt} - 2\frac{d E(X^3)}{dt} - \frac{d E(X^4)}{dt}\right].$

## Why use MC/LN approximations?

• Model building
• a quick method for estimating the mean/variance of system
• set the differentials to zero
• Parameter inference
• instead of multiple forward simulations (per MCMC iteration), solve a set of ODEs
• Lotka-Volterra model - $$10^4$$ iterations in less than 2 seconds
• Experimental design
• Each iteration of the algorithm reduces to the parameter inference problem

## Example: Heat shock model (Proctor et al, 2005)

• Stochastic kinetic model of the heat shock system
• twenty-three reactions
• seventeen chemical species
• A single stochastic simulation up to $$t=2000$$ takes about $$35$$ minutes.
• If we convert the model to moment equations, we get $$139$$ equations

## Issues

• When modelling biological systems, it's unusual to have detailed knowledge about all rate constants
• Vague parameters over many orders of magnitude
• Uncertain initial conditions
• Unobserved species
• How can we assess the appropriateness of the MC/LN approximation?

## P53-Mdm2 oscillation model

• Proctor and Grey, 2008
• 16 chemical species
• Around a dozen reactions
• The model contains a single event
• At $$t = 1$$, set $$X = 0$$
• If we convert the model to moment equations, we get 139 equations.

## SIR model

• Immigration: $$\emptyset \longrightarrow X + 1$$ and $$\emptyset \xrightarrow{k_2 sf} Y + 1$$
• Death: $$X \xrightarrow{k_3 sf} \emptyset$$ and $$Y \longrightarrow \emptyset$$
• Interaction: $$X +Y \longrightarrow 2 Y$$

## Assessing the moment closure approximations

• Accuracy of the mean and variance estimates
• For any system with a second-order reaction, we can prove that the MC/LN approximations are not exact
• Suitability of the normality assumption
• Continuous distribution for a discrete state space
• The question is, are the approximations good enough?

## Assessing moment accuracy

• If the system contains only zero and first order reactions, the MC/LN moment estimates are exact
• Intuitively, if the MC/LN produce different moment estimates, this suggests an issue with the approximation
• Note - if the MC/LN approximations agree, this does not imply accuracy
• For a particular set of parameters, $$\gamma_i = (c_1, \ldots, c_L)$$, calculate the standardised difference $D_{j}^{LM}(\mathbf{\gamma}_i) = \frac{\psi_j^l(\mathbf{\gamma}_i) - \psi_j^m(\mathbf{\gamma}_i)}{\sqrt{\Sigma_{jj}^l(\mathbf{\gamma}_i)}}$ where
• $$\psi_j(\mathbf{\gamma}_i)$$ is the solution to the equation for species $$j$$ for parameters $$\gamma_i$$ using the LN or MC approximation
• $$\Sigma_{jj} (\mathbf{\gamma}_i)$$ is the variance estimate
• Large values of $$D_j^{LM}$$ indicate a potential issues with the approximation

## Lotka Volterra system

• Simple system with non-linear dynamics $X \rightarrow X + 1, \quad X + Y \rightarrow 2Y \quad \text{and} \quad Y \rightarrow \emptyset$
• Under certain parameter combinations, deterministic model predicts sustained oscillations
• But the stochastic system can become extinct
• Set-up:
• Fix $$c_3=0.3$$ and $$X(0) = Y(0)=100$$
• Maximum simulation time $$t=30$$
• Assess the moment closure and the linear noise approximations over the parameter ranges $c_1 \sim (10^{-6}, 10^0) \quad \text{and} \quad c_2 \sim (10^{-6}, 10^0)$ on the log scale

## LV moment assessment

• Form a Latin hypercube on the log scale
• At each point, $$\gamma_i$$ on the hyper cube calculate the standardised difference $$D_{j}^{LM}(\mathbf{\gamma}_i)$$
• Values of $$D_{j}^{LM}(\mathbf{\gamma}_i) > 5$$ are shown as red triangles

## Assessing the normality

• We can borrow ideas from Gaussian process assessment
• At each point, $$\gamma_i$$, on the design space
• estimate the mean, $$\psi$$, and co-variance, $$\Sigma$$, using the LN/MC approximations
• simulate using an exact simulator
• Prediction error for species $$j$$,
$e_{i,j} = y_j^*(\mathbf{\gamma}_i) - \psi_j^{l}(\mathbf{\gamma}_i)$
• Easier to work with the standardised prediction errors $e_{i,j}^* = \frac{y_j^*(\mathbf{\gamma}_i) - \psi_j^l(\mathbf{\gamma}_i)}{\sqrt{\Sigma_{jj}^l(\mathbf{\gamma}_i)}} \;.$
• If the assumptions are correct, then $$e_{i,j}^* \sim N(0, 1)$$

## Interval diagnostics

• Construct $$100\alpha$$% confidence interval using the approximate mean $$\psi$$ and variance, $$\Sigma$$
• Denote a particular confidence region at design point $$i$$, for species $$j$$, as $$CI_{i,j} (\alpha)$$
• The proportion of simulated values that land within the confidence region is given by $D_j^{CR} = \frac{1}{n_d} \sum_{i=1}^{n_d} \mathbf{1}[y_j^*(\mathbf{\gamma}_i) \in CI_{i,j}(\alpha)],$ where $$\mathbf{1}[\cdot]$$ is the indicator function. The value of $$D_{j}^{CR}$$ should be approximately equal to $$\alpha$$

## Schlogl model

• A well known test system that exhibits bi-modal and non-linear characteristics at certain parameter combinations
• The system contains four reactions $2X_1 + X_2 \xrightarrow{\phantom{a}c_{1}\phantom{a}} 3X_1, \quad 3X_1 \xrightarrow{\phantom{a}c_{2}\phantom{a}} 2X_1 + X_2, \quad X_3 \xrightarrow{\phantom{a}c_{3}\phantom{a}} X_1 \\ X_1 \xrightarrow{\phantom{a}c_{4}\phantom{a}} X_3$ where the distribution of $$X_1$$ is bi-modal, the Gaussian assumption would clearly be inappropriate
• Parameters $$\{c_1, c_2\}$$ are fixed at $$\{3\times 10^{-7}, 10^{-4}\}$$ and the initial conditions are held constant at $\{X_1(0), X_2(0), X_3(0)\} = \{250, 10^5, 2\times 10^5\} \;.$
• Two dimensional hypercube size $$n_d=1,000$$ on the $$\log_{10}$$ space for the parameters $$\{c_3, c_4\}$$
• At each of the $$n_d$$ points on the hypercube, we simulate using the direct method and the LNA to a maximum time of $$t=5$$.

## Schlogl model: diagnostics

• Latin hyper-cube
• Standardised errors
• Q-Q plot
• $$X_1$$ vs $$e_i^*$$

## Schlogl model: simulations

• Investigate large standardised errors more carefully

## Summary

• Moment closure/linear approximations provide a quick, easy to construct approximation to the system of interest
• Intelligent emulator
• Care needs to be taken to assess the suitability of the approximation
• Assessing the accuracy of other approximate simulators is tricky (arXiv:1409.1096)

### References

• Gillespie, CS. Moment closure approximations for mass-action models. IET Systems Biology 2009
• Milner, P, Gillespie, CS and Wilkinson, DJ. Moment closure approximations for stochastic kinetic models with rational rate laws. Mathematical Biosciences 2011
• Gillespie, C.S. Diagnostics for assessing the accuracy of approximate stochastic simulators, arXiv:1409.1096