Sampling for Approximate Inference in Continuous Time Bayesian Networks Yu Fan University of California, Riverside yfan@cs.ucr.edu Christian R. Shelton University of California, Riverside cshelton@cs.ucr.edu Abstract We ﬁrst present a sampling algorithm for continuous time Bayesian networks based on importance sampling. We then extend it to continuous-time particle ﬁltering and smoothing algorithms. The three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the timeline. We present experimental results on their accuracies and time efﬁciencies, and compare them to expectation propagation. 1 Introduction Many systems evolve asynchronously in continuous time, for example computer networks, sensor networks, mo- bile robots, and cellular metabolisms. Continuous time Bayesian networks (CTBNs) (Nodelman, Shelton, & Koller 2002) model such stochastic systems in continuous time us- ing graphs to represent conditional independencies among discrete-valued processes. They have been applied to human-computer interactions (Nodelman & Horvitz 2003), server farm failures (Herbrich, Graepel, & Murphy 2004), and robot monitoring (Ng, Pfeffer, & Dearden 2005). A tra- jectory (sample) from a CTBN consists of the starting values for the system along with the (real-valued) times at which the variables change and their corresponding new values. Inference for CTBNs is the task of estimating the distri- bution over trajectories given a partial trajectory (in which some values or transitions are missing for some variables during some time intervals). Performing exact inference in CTBNs is intractable. Recently Nodelman, Koller, & Shelton (2005) presented an approximate inference method based on expectation propagation (Minka 2001). Saria, Nodelman, & Koller (2007) extended it to full believe prop- agation and provided a method to adapt the approximation quality. In this paper we explore a different approach. In- stead of approximating the distributions involved, we use sampling to approximate the expectation of a function of the trajectory. Sampling has the advantage of being an anytime algorithm. (We can stop at any time during the computation and obtain an answer.) Furthermore, in the limit of inﬁnite samples (computation time), it converges to the true answer. Our algorithm is simple to implement. However, the for- mulation of this sampling procedure is not trivial due to the Copyright c  2008, authors listed above. All rights reserved. inﬁnite extent of the trajectory space, both in the transition time continuum and the number of transitions. 1.1 Previous Work Sampling from dynamic systems is not new. However, most prior work has been in the area of discrete-time sys- tems. Continuous-time systems pose different problems. As we note below, any evidence containing a record of the change in a variable has a zero probability under the model. Therefore rejection sampling and straightforward likelihood weighting are not generally viable methods. Ng, Pfeffer, & Dearden (2005) developed a continuous- time particle ﬁltering algorithm. It only handled point evi- dence on binary and ternary discrete variables using rejec- tion sampling and focused primarily on the incorporation of evidence from the continuous state part of the system. By contrast our algorithm does not incorporate real-valued state information, but it allows any evidence set and per- forms general inference (not just ﬁltering). Our algorithm can be adapted to a population-based ﬁlter (a particle ﬁlter). 2 Continuous Time Bayesian Networks Continuous time Bayesian networks (Nodelman, Shelton, & Koller 2002) are based on the framework of continuous time, ﬁnite state, homogeneous Markov processes. Let X be a continuous time, ﬁnite state, homogeneous Markov process with n states {x 1 ,...,x n }. The behavior of X is described by the initial distribution P 0 X and the intensity matrix Q X =     -q x1 q x1x2 ··· q x1xn q x2x1 -q x2 ··· q x2xn . . . . . . . . . . . . q xnx1 q xnx2 ··· -q xn     , where q xixj is the intensity with which X transitions from x i to x j and q xi = ∑ j=i q xixj . The intensity matrix Q X is time invariant. Given Q X , the amount of time X stays at x i follows an exponential distribution with parameter q xi . That is, the probability density function of X remaining at x i is f (q xi ,t)= q i exp(-q xi t). The probability X transitions from state x i to x j is θ xixj = q xixj /q xi . A conditional intensity matrix(CIM) Q X|U is deﬁned as a set of intensity matrices Q X|u , one for each instantiation u of the variable