Estimation of Non-stationary Markov Chain Transition Models L. F. Bertuccelli and J. P. How Aerospace Controls Laboratory Massachusetts Institute of Technology {lucab, jhow} @mit.edu Abstract— Many decision systems rely on a precisely known Markov Chain model to guarantee optimal performance, and this paper considers the online estimation of unknown, non- stationary Markov Chain transition models with perfect state observation. In using a prior Dirichlet distribution on the uncertain rows, we derive a mean-variance equivalent of the Maximum A Posteriori (MAP) estimator. This recursive mean- variance estimator extends previous methods that recompute the moments at each time step using observed transition counts. It is shown that this mean-variance estimator responds slowly to changes in transition models (especially switching models) and a modiﬁcation that uses ideas of pseudonoise addition from classical ﬁltering is used to speed up the response of the estimator. This new, discounted mean-variance estimator has the intuitive interpretation of fading previous observations and provides a link to fading techniques used in Hidden Markov Model estimation. Our new estimation techniques is both faster and has reduced error than alternative estimation techniques, such as ﬁnite memory estimators. I. I NTRODUCTION Many decision processes, such as Markov Decision Pro- cesses (MDPs) and Jump Markov Linear systems, are mod- eled as a probabilistic process driven by a Markov Chain. The true parameters of the Markov Chain are frequently unavailable to the modeler, and many researchers have re- cently been addressing the issue of robust performance in these decision systems [4], [6], [13], [16]. However, a large body of research has also been devoted to the identiﬁcation of the Markov Chain using available observations. With few exceptions (such as the signal processing community [11], [17]), most of this research has addressed the case of a unique, stationary model. When the transition matrix Π of a Markov Chain is stationary, classical maximum likelihood (ML) schemes [9], [17] can be used to recursively obtain the best estimate ˆ Π of the transition matrix. Typical Bayesian methods as- sume a prior Dirichlet distribution on each row of the transition matrix, and exploit the conjugacy property of the Dirichlet distribution with the multinomial distribution to recursively compute ˆ Π. This technique amounts to evaluating the empirical frequency of the transitions to obtain a ML or Maximum A Posteriori (MAP) estimate of the transition matrix. In the limit of an inﬁnite observation sequence, this method converges to the true transition matrix, Π. Jilkov and Li [9] discuss the identiﬁcation of the transition matrices in the context of Markov Jump systems, providing multiple algorithms that can identify Π using noisy measurements that are indirect observations of the transitions. In one of these approaches, a renormalization is used to ensure that the probability estimates sum to unity. Jaulmes et al. [7], [8] study this problem in an active estimation context using Partially Observable Markov Decision Processes (POMDPs). Marbach [14] considers this problem, when the transition probabilities depend on a parameter vector. Borkar and Varaiya [5] treat the adaptation problem in terms of a single parameter as well; namely, the true transition probability model is assumed to be a function a single parameter a belonging to a ﬁnite set A. Konda and Tsitsiklis [10] consider the problem of slowly-varying Markov Chains in the context of reinforcement learning. Sato [18] considers this problem and shows asymptotic convergence of the probability esti- mates also in the context of dual control. Kumar [12] also considered the adaptation problem. If the Markov Chain, Π t , is changing over time, classical ML or MAP estimators will generally fail to respond quickly to changes in the model. The intuition behind this is that since these estimators keeps track of all the transitions that have occurred, a large number of new transitions will be re- quired for the change detection, and convergence to the new model. Hence, new estimators are required to compensate for the inherent delay that will occur in classical techniques. Note that if the dynamics of the transition matrix were available to the estimator designer, they could be embedded directly in the estimator. For example, if the transition matrix were known to switch between two systems according to a probabilistic switching schedule, or if the switching time were a random variable with known statistics, these pieces of information could enhance the performance of any estimator. However, in a more general setting, it is unlikely that this information would be available to the estimator designer. This paper proposes a new technique to speed up the estimator response that does not require information on the dynamics of the uncertain transition model. First, recursions for the mean and variance of the Dirichlet distribution are de- rived; these are equivalent to a mean-variance interpretation of classical ML or MAP estimation techniques. Importantly, however, we use the similarity of these recursions to Kalman ﬁlter-based parameter estimation techniques to notice that the mean-variance estimator does not incorporate any knowledge of the parameter (or transition matrix) dynamics, and there- fore results in stationary prediction step. To compensate for this, the responsiveness of the estimator can be improved by adding an artiﬁcial pseudonoise to the variance which is implemented by scaling the variance [15]. Scaling the Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 2008 TuA02.4 978-1-4244-3124-3/08/$25.00 ©2008 IEEE 55