J. R. Statist. Soc. B (1995)
57, No.4, pp. 705-720
A Class of Bayesian Models for Optimal Exploration
By K. D. GLAZEBROOK and R. J. BOYSt
University of Newcastle upon Tyne, UK
[Received July 1994. Revised November 19941
SUMMARY
Each of several locations contains an unknown number of objects of value. A single search
of a location will discover some of these objects which are then removed. Discoveries
yield rewards. A 'distribution of effort' problem is posed concerning how to explore the
locations optimally-Leo to maximize the expected return from discoveries made. This is
formulated as a Bayes sequential decision problem for which index policies are optimal.
A natural simple case is one in which, conditionally on the number of undiscovered objects
at a location N, the number of discoveries made in a single search is binomial B(N, p)
where p is a detection rate. For this case, we can gain considerable insight into how model
structure relates to policy structure. The tail behaviour of the priors for the number of
objects at each location plays an important role.
Keywords: BAYES SEQUENTIAL DECISION RULE; CONJUGATE PRIORS; DYNAMIC PRO-
GRAMMING; GITTINS INDEX; OPTIMAL SEARCH
1. INTRODUCTION
Each of L locations contains an unknown number of objects of value. A single
search of a location will lead to the discovery of some of the objects which are
there. These are then removed from the location and a consequential (net) reward
is earned. Our 'distribution of effort' search problem envisages a situation in which
(at most) one location may be searched at a time and seeks a rule for deciding which
(if any) location should be explored next to maximize some measure of total reward
earned from exploration.
Plainly, features of a location which are key to such decisions are
(a) how many objects we believe remain to be discovered there,
(b) how easy it is to search and
(c) how valuable discoveries there are.
Each of these features is reflected in the class of Bayes sequential decision models
that we use to analyse the exploration problems of interest to us. These are described
in Section 2. Most crucially, we adopt a Bayesian approach in describing our beliefs
about the number of objects at each location in advance of any exploration by
means of a set of (independent) prior distributions. The problem of determining
a Bayes rule can be formulated as a multiarmed bandit to which the theory of Gittins
indexation may be applied. See Gittins and Jones (1974) and Gittins (1989). This
theory indicates that with each location may be associated a calibrating index to
tAddress for correspondence: Department of Mathematics and Statistics, University of Newcastle upon Tyne,
Newcastle upon Tyne, NEI 7RU, UK.
© 1995 Royal Statistical Society 0035-9246/95/57705