Estimation of Non-stationary Markov Chain Transition Models
L. F. Bertuccelli and J. P. How
Aerospace Controls Laboratory
Massachusetts Institute of Technology
{lucab, jhow} @mit.edu
Abstract— Many decision systems rely on a precisely known
Markov Chain model to guarantee optimal performance, and
this paper considers the online estimation of unknown, non-
stationary Markov Chain transition models with perfect state
observation. In using a prior Dirichlet distribution on the
uncertain rows, we derive a mean-variance equivalent of the
Maximum A Posteriori (MAP) estimator. This recursive mean-
variance estimator extends previous methods that recompute
the moments at each time step using observed transition counts.
It is shown that this mean-variance estimator responds slowly
to changes in transition models (especially switching models)
and a modification that uses ideas of pseudonoise addition
from classical filtering is used to speed up the response of the
estimator. This new, discounted mean-variance estimator has
the intuitive interpretation of fading previous observations and
provides a link to fading techniques used in Hidden Markov
Model estimation. Our new estimation techniques is both faster
and has reduced error than alternative estimation techniques,
such as finite memory estimators.
I. I NTRODUCTION
Many decision processes, such as Markov Decision Pro-
cesses (MDPs) and Jump Markov Linear systems, are mod-
eled as a probabilistic process driven by a Markov Chain.
The true parameters of the Markov Chain are frequently
unavailable to the modeler, and many researchers have re-
cently been addressing the issue of robust performance in
these decision systems [4], [6], [13], [16]. However, a large
body of research has also been devoted to the identification
of the Markov Chain using available observations. With few
exceptions (such as the signal processing community [11],
[17]), most of this research has addressed the case of a
unique, stationary model.
When the transition matrix Π of a Markov Chain is
stationary, classical maximum likelihood (ML) schemes [9],
[17] can be used to recursively obtain the best estimate
ˆ
Π of the transition matrix. Typical Bayesian methods as-
sume a prior Dirichlet distribution on each row of the
transition matrix, and exploit the conjugacy property of the
Dirichlet distribution with the multinomial distribution to
recursively compute
ˆ
Π. This technique amounts to evaluating
the empirical frequency of the transitions to obtain a ML
or Maximum A Posteriori (MAP) estimate of the transition
matrix. In the limit of an infinite observation sequence, this
method converges to the true transition matrix, Π. Jilkov and
Li [9] discuss the identification of the transition matrices
in the context of Markov Jump systems, providing multiple
algorithms that can identify Π using noisy measurements
that are indirect observations of the transitions. In one of
these approaches, a renormalization is used to ensure that
the probability estimates sum to unity. Jaulmes et al. [7],
[8] study this problem in an active estimation context using
Partially Observable Markov Decision Processes (POMDPs).
Marbach [14] considers this problem, when the transition
probabilities depend on a parameter vector. Borkar and
Varaiya [5] treat the adaptation problem in terms of a single
parameter as well; namely, the true transition probability
model is assumed to be a function a single parameter a
belonging to a finite set A. Konda and Tsitsiklis [10] consider
the problem of slowly-varying Markov Chains in the context
of reinforcement learning. Sato [18] considers this problem
and shows asymptotic convergence of the probability esti-
mates also in the context of dual control. Kumar [12] also
considered the adaptation problem.
If the Markov Chain, Π
t
, is changing over time, classical
ML or MAP estimators will generally fail to respond quickly
to changes in the model. The intuition behind this is that
since these estimators keeps track of all the transitions that
have occurred, a large number of new transitions will be re-
quired for the change detection, and convergence to the new
model. Hence, new estimators are required to compensate
for the inherent delay that will occur in classical techniques.
Note that if the dynamics of the transition matrix were
available to the estimator designer, they could be embedded
directly in the estimator. For example, if the transition matrix
were known to switch between two systems according to
a probabilistic switching schedule, or if the switching time
were a random variable with known statistics, these pieces of
information could enhance the performance of any estimator.
However, in a more general setting, it is unlikely that this
information would be available to the estimator designer.
This paper proposes a new technique to speed up the
estimator response that does not require information on the
dynamics of the uncertain transition model. First, recursions
for the mean and variance of the Dirichlet distribution are de-
rived; these are equivalent to a mean-variance interpretation
of classical ML or MAP estimation techniques. Importantly,
however, we use the similarity of these recursions to Kalman
filter-based parameter estimation techniques to notice that the
mean-variance estimator does not incorporate any knowledge
of the parameter (or transition matrix) dynamics, and there-
fore results in stationary prediction step. To compensate for
this, the responsiveness of the estimator can be improved
by adding an artificial pseudonoise to the variance which
is implemented by scaling the variance [15]. Scaling the
Proceedings of the
47th IEEE Conference on Decision and Control
Cancun, Mexico, Dec. 9-11, 2008
TuA02.4
978-1-4244-3124-3/08/$25.00 ©2008 IEEE 55