IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 49, NO. 10, OCTOBER 2004 1843
Single-Run Gradient Estimation Via Measure-Valued
Differentiation
Bernd Heidergott and Arie Hordijk
Abstract—We show how single-run-based measure-valued differentia-
tion gradient estimators can be obtained. The key idea is to apply a change
of measure a posterior to the mathematical analysis of the derivative. From
the point of view of the likelihood ratio method, we show that likelihood
ratio type gradient estimators can be applied in situations where the
mathematical conditions needed for applying a likelihood ratio analysis
are not met.
Index Terms—Gradient estimation, likelihood ratios, measure-valued
differentiation, Markov chains.
I. INTRODUCTION
Many real-world systems in the area of telecommunication, manu-
facturing, and transportation can be modeled as (general state-space)
Markov chains. Deriving efficient gradient estimators for Markov
chains is important for stochastic optimization of such systems.
Measure-valued differentiation (MVD) is a mathematical framework
for computing derivatives and gradient estimators, respectively, for
Markov kernels. Specifically, in [7] a product rule of MVD for general
state-space Markov chain has been established and in [6] sufficient
conditions are provided for the unique stationary distribution of a
general state-space Markov chain to have a measure-valued derivative.
Derivative formulas obtained via the MVD approach can be translated
into unbiased gradient estimators. Unfortunately, the MVD estimator,
as for example introduced in [8], involves splitting a sample path into
two subpaths called the positive and negative version of the Markov
chain, respectively, or phantoms. This splitting has to be applied at any
transition of the process which causes a considerable computational
burden when running an MVD estimator. For geometrically ergodic
chains, combining the splitting technique with an appropriate coupling
leads to geometrically bounded cycle lengths of the gradient estimator
for stationary characteristics; see [6].
The likelihood ratio (respectively, the score function) approach to
gradient estimation for general state-space Markov chains is based on
differentiating densities rather than measures, see [4], [3], [9], and [10].
While this approach is in general more restrictive than MVD it enjoys
the nice property that likelihood ratio gradient estimators can be eval-
uated through observing a single sample path: there is no splitting in-
volved. This reduces the computational effort of estimating a gradient
via a likelihood ratio estimator compared to a MVD estimator.
It is widely believed that this feature of the estimators “single run
based” on the one side and “splitting sample paths” on the other side
is a generic distinction of the two methods. In this note we show that
this is not the case. Specifically, we show that any MVD formula can
be interpreted in a likelihood ratio manner by introducing appropriate
Radon–Nykodim derivatives which are obtained via a convex combina-
tion [to be defined in (6)] of the positive and negative part of the MVD
derivative of the Markov kernel.
Manuscript received July 7, 2003; revised June 7, 2004. Recommended by
Associate Editor Y. Wardi.
B. Heidergott is with the Vrije Universiteit Amsterdam and Tinbergen
Institute, 1081 HV Amsterdam, The Netherlands (e-mail: bheidergott@
feweb.vu.nl).
A. Hordijk is with the Leiden University, Mathematical Institute, 2300 RA
Leiden, The Netherlands (e-mail: hordijk@math.leidenuniv.nl).
Digital Object Identifier 10.1109/TAC.2004.835588
In this note, we address representations for the derivative of a gen-
eral state-space Markov chain. Elaborating on appropriate product rules
of differentiation, the extension of our results to finite horizon exper-
iments is straightforward. Indeed, for likelihood ratios this extension
readily follows from the product rule of differentiation for real-valued
mappings and for MVD we resort to the product rule of MVD estab-
lished in [6]. Moreover, derivative formulae for the unique stationary
distribution of a Markov chain via likelihood ratios are given in [4] and
via MVD in [6].
The note is organized as follows. Section II introduces the basic con-
cepts of MVD. Section III presents the likelihood ratio approach. In
Section IV, the combination of both approaches is discussed. Even-
tually, we address the problem of finding the best possible way of
constructing a single-run MVD estimator.
II. MVD
This section provides a brief review on the basic concepts of mea-
sure-valued differentiation. For details we refer to [5]–[8]. Let
be a Polish measurable space. Let denote the set of finite
(signed) measures on and that of probability mea-
sures on .
A mapping on is called a (homogeneous) transition kernel
on if a) for all , and b) is
measurable for all . If, in condition a), can be replaced
by , then is called a Markov kernel on . Denote the
set of transition kernels on by and the set of Markov
kernels on by .
We consider a family of Markov kernels on ,
with . Let denote the set of measur-
able mappings , such that is finite for
all and .
For , we call differentiable at
with respect to , or -differentiable for short, if for any a
exists such that for any
(1)
Notice that fails to be a Markov kernel which poses the problem
of sampling from . In order to facilitate for simulation we
introduce the notion of -derivative, which extends the concept of a
weak derivative which is only defined for bounded continuous perfor-
mance mappings.
Let be -differentiable at . Any triple , with
and a measurable mapping from to , that
satisfies for all that
(2)
is called a -derivative of . The kernel is called the positive
part and the kernel is called the negative part of , respectively.
Sufficient condition for the existence of a -derivative can be found in
[5].
Example 1: Let and let denote the Borel field on
. For let denote the Markov chain on with
transition probability
0018-9286/04$20.00 © 2004 IEEE