Physica D 213 (2006) 190–196 www.elsevier.com/locate/physd Selecting nonlinear stochastic process rate models using information criteria David M. Walker a, , Glenn Marion b a Biomathematics and Statistics Scotland, The Macaulay Institute, Craigiebuckler, Aberdeen AB15 8QH, United Kingdom b Biomathematics and Statistics Scotland, James Clerk Maxwell Building, King’s Buildings, Edinburgh EH9 3JZ, United Kingdom Received 3 May 2005; received in revised form 7 November 2005; accepted 22 November 2005 Available online 20 December 2005 Communicated by H. Levine Abstract We demonstrate how unknown process rates within a stochastic modelling framework based on Markov processes can be approximated from time series data using polynomial basis functions. The problem of model selection is considered by adapting basis function selection methods and the minimum description length information criteria which have previously been developed for nonlinear autoregressive models of time series under Gaussian noise assumptions. We investigate the effectiveness of the methods with application to stochastic biological population models. c 2005 Elsevier B.V. All rights reserved. Keywords: Stochastic process models; Model selection; Description length; Nonlinear models 1. Introduction Stochastic modelling and simulation have been widely applied to understand and describe the behaviour of a range of complex phenomena including biological populations [1–6], epidemic dynamics [7] and chemical reactions [8]. The theory of stochastic processes [9] is also a natural framework in which to study so-called agent-based models in which agents interact with each other and their environment using simple local rules. For example, in economics one can think of an agent buying, selling, or holding stock on the basis of limited information, in ecology a grazing animal may choose to graze a particular location, or decide to forage depending on the local availability of resources or individual energy requirements [10]. The stochastic approach to modelling can not only account for variability and spatial heterogeneity, but also point the way towards better deterministic representations which model such variation using suitable limiting processes and approximations [11,7,2,3,5,12,10]. Despite the widespread use of stochastic process models techniques which link them to observational data are somewhat Corresponding author. E-mail addresses: d.walker@bioss.ac.uk (D.M. Walker), glenn@bioss.ac.uk (G. Marion). limited and methods are needed to estimate (the distribution of) parameter values from data and to perform model selection, that is to select the model (from some defined class) which is best supported by the data. Recent advances in computational methods such as Markov chain Monte Carlo (MCMC) have enabled parameter estimation for discrete- time Markov models [13] and continuous time Markovian [14–16] and non-Markovian processes [17]. MCMC methods have also been applied to enable model selection between simple epidemic models [18]. However, although in principle very flexible, MCMC methods are computationally intensive and more worryingly, except in special cases [19], there are no general results allowing a decision to be made as to when, or if, the Markov chain has converged to the distribution of interest and therefore heuristic criteria are typically employed [20]. In this paper we avoid the problems associated with MCMC by tackling model selection in Markov process models by discriminating between competing models using a novel application of a basis selection algorithm [21] and the minimum description length (MDL) principle [22] previously developed for model selection in nonlinear time series reconstruction. In [21] a deterministic predictive model of a system in an equivalent phase space is reconstructed from time series data under Gaussian noise assumptions. Our approach differs in that we attempt to reconstruct deterministic process rates of 0167-2789/$ - see front matter c 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.physd.2005.11.007