Please cite this article in press as: Valenzuela, P.E., et al., Analysis of averages over distributions of Markov processes. Automatica (2018),
https://doi.org/10.1016/j.automatica.2018.09.016.
Automatica ( ) –
Contents lists available at ScienceDirect
Automatica
journal homepage: www.elsevier.com/locate/automatica
Technical communique
Analysis of averages over distributions of Markov processes
✩
Patricio E. Valenzuela *, Cristian R. Rojas, Håkan Hjalmarsson
Department of Automatic Control and ACCESS Linnaeus Center, School of Electrical Engineering, KTH Royal Institute of Technology, SE-100 44 Stockholm,
Sweden
article info
Article history:
Received 28 July 2017
Received in revised form 25 March 2018
Accepted 19 July 2018
Available online xxxx
Keywords:
System identification
Input design
Markov chains
abstract
In problems of optimal control of Markov decision processes and optimal design of experiments, the
occupation measure of a Markov process is designed in order to maximize a specific reward function.
When the memory of such a process is too long, or the process is non-Markovian but mixing, it makes
sense to approximate it by that of a shorter memory Markov process. This note provides a specific bound
for the approximation error introduced in these schemes. The derived bound is then applied to the
proposed solution of a recently introduced approach to optimal input design for nonlinear systems.
© 2018 Elsevier Ltd. All rights reserved.
1. Introduction
The field of Markov Decision Processes (MDP) is a very mature
area of research (Puterman, 1994), where the goal is usually to
design the action policy in order to maximize an average or dis-
counted reward function. There are two main, dual approaches
to solve these problems, namely, through dynamic programming
(Bellman, 1957) or via occupation measures (Borkar, 1988) (which
correspond to the stationary probabilities of the joint action–state
pair); the latter approach has some advantages over the former,
for example, for MDP problems subject to average constraints
(Altman, 1999).
The solution scheme for MDPs based on occupation measures
has found applications in other control-related problems such as
a nonlinear optimal control (Lasserre, Prieur, & Henrion, 2005;
Vaidya, Mehta, & Shanbhag, 2010), stability analysis (Vaidya &
Mehta, 2008) and optimal input design for nonlinear systems
(Valenzuela, Rojas, & Hjalmarsson, 2015).
For some of these problems, in particular for input design
(Valenzuela et al., 2015), the Markovian assumption is actually an
approximation, in the sense that the stochastic process (the output
of a nonlinear system) is a mixing process, so the conditional dis-
tribution of current value, given the entire past, is approximately
equal to the conditional distribution given a finite number of values
✩
This work was supported by the Swedish Research Council under contracts
621-2011-5890 and 621-2009-4017. The material in this paper was not presented
at any conference. This paper was recommended for publication in revised form by
Associate Editor A. Pedro Aguiar under the direction of Editor André L. Tits.
*
Corresponding author.
E-mail addresses: pva@kth.se (P.E. Valenzuela), crro@kth.se (C.R. Rojas),
hjalmars@kth.se (H. Hjalmarsson).
of its most recent past. This approximation is indeed essential for
the application of the occupation measure approach.
In this paper we revisit the validity of this Markovian approxi-
mation. In particular, we consider a mixing process corresponding
to the output of an exponentially forgetting system (Ljung, 1978)
driven by a Markov process of finite memory, and we provide rig-
orous bounds on the difference between the mean of a function of
this process and that of its Markovian approximation, as a function
of the length of its memory. The bound obtained tends to zero as
the memory length grows to infinity, establishing the validity of
the Markovian approximation. Also, we apply this bound to the
input design approach in Valenzuela et al. (2015), and provide a
bound of the accuracy of that procedure.
The structure of this article is as follows. Section 2 presents
preliminaries on Markov processes. Section 3 introduces the main
results of this manuscript. Finally, Section 4 presents concluding
remarks.
2. Preliminaries on Markov process
This section introduces the elements from the theory of Markov
processes required in the main result of this note (Theorem 1).
A Markov process is a stochastic process where its conditional
probability distributions given its past only depend on its nearest
past with probability one (Doob, 1953, p. 80). In the following, we
consider a Markov process defined for all t ≥ 0 as
x
t +1
∼ p(x
t +1
|x
t
) , (1)
where p is a conditional probability mass function, x
t
∈ X for all
t ≥ 0, and X is a finite set. Based on (1), we recursively define the
https://doi.org/10.1016/j.automatica.2018.09.016
0005-1098/© 2018 Elsevier Ltd. All rights reserved.