Markov chain Monte Carlo algorithms for SDE parameter estimation Andrew Golightly and Darren J. Wilkinson April 1, 2008 Abstract This chapter considers stochastic differential equations for Systems Biology models derived from the Chemical Langevin Equation (CLE). After outlining the derivation of such models, Bayesian inference for the parameters is considered, based on state-of-the-art Markov chain Monte Carlo algorithms. Start- ing with a basic scheme for models observed perfectly, but discretely in time, problems with standard schemes and their solutions are discussed. Extensions of these schemes to partial observation and ob- servations subject to measurement error are also considered. Finally, the techniques are demonstrated in the context of a simple stochastic kinetic model of a genetic regulatory network. 1 Introduction It is now well recognised that the dynamics of many genetic and biochemical networks are intrinsically stochastic. Stochastic kinetic models provide a powerful framework for modelling such dynamics; see, for example, McAdams & Arkin (1997) and Arkin et al. (1998) for some early examples. In principle such stochastic kinetic models correspond to discrete state Markov processes that evolve continuously in time (Wilkinson 2006). Such processes can be simulated on a computer using a discrete-event simulation technique such as the Gillespie algorithm (Gillespie 1977). Since many of the parameters governing these models will be uncertain, it is natural to want to estimate them using experimental data. Although it is possible in principle to use a range of different data for this task, time course data on the amounts of bio- molecules at the single-cell level are the most informative. Boys et al. (2008) show that it is possible to directly infer rate constants of stochastic kinetic models using fully Bayesian inference and sophisticated Markov chain Monte Carlo (MCMC) algorithms. However, the techniques are highly computationally in- tensive, and do not scale-up to problems of practical interest in Systems Biology. It seems unlikely that fully Bayesian inferential techniques of practical value can be developed based on the original Markov jump process formulation of stochastic kinetic models, at least given currently available computing hardware. It is therefore natural to develop techniques which exploit some kind of approximation in order to speed up computations. One possibility, explored in Boys et al. (2008), is to work with the exact model, but to introduce approximations into the Bayesian inference algorithm. Although this approach does have some promise, it seems difficult to speed up the algorithm sufficiently for practical purposes without sacrificing too much inferential accuracy. An alternative approach is to approximate the model, and then conduct exact Bayesian inference for the approximate model. This latter approach is much more flexible, as there are many approximations to the underlying model which can be made, and it is easier to understand the accuracy of the proposed approximations and the likely benefit in terms of computational speed-up. Perhaps the most obvious approach would be to use a deterministic approximation, such as that based on the reaction rate equations (Gillespie 1992). However, such an approach performs very badly when the underlying process has a significant degree of stochasticity, as the reaction rate equations (RREs) effectively “throw away” all of the stochasticity in the process, and consequently, all of the information in the process “noise”. The information in the noise is often quite substantial, and needs to be utilised for effective inference. 1