J. R. Statist. Soc. B (1995) 57, No.4, pp. 705-720 A Class of Bayesian Models for Optimal Exploration By K. D. GLAZEBROOK and R. J. BOYSt University of Newcastle upon Tyne, UK [Received July 1994. Revised November 19941 SUMMARY Each of several locations contains an unknown number of objects of value. A single search of a location will discover some of these objects which are then removed. Discoveries yield rewards. A 'distribution of effort' problem is posed concerning how to explore the locations optimally-Leo to maximize the expected return from discoveries made. This is formulated as a Bayes sequential decision problem for which index policies are optimal. A natural simple case is one in which, conditionally on the number of undiscovered objects at a location N, the number of discoveries made in a single search is binomial B(N, p) where p is a detection rate. For this case, we can gain considerable insight into how model structure relates to policy structure. The tail behaviour of the priors for the number of objects at each location plays an important role. Keywords: BAYES SEQUENTIAL DECISION RULE; CONJUGATE PRIORS; DYNAMIC PRO- GRAMMING; GITTINS INDEX; OPTIMAL SEARCH 1. INTRODUCTION Each of L locations contains an unknown number of objects of value. A single search of a location will lead to the discovery of some of the objects which are there. These are then removed from the location and a consequential (net) reward is earned. Our 'distribution of effort' search problem envisages a situation in which (at most) one location may be searched at a time and seeks a rule for deciding which (if any) location should be explored next to maximize some measure of total reward earned from exploration. Plainly, features of a location which are key to such decisions are (a) how many objects we believe remain to be discovered there, (b) how easy it is to search and (c) how valuable discoveries there are. Each of these features is reflected in the class of Bayes sequential decision models that we use to analyse the exploration problems of interest to us. These are described in Section 2. Most crucially, we adopt a Bayesian approach in describing our beliefs about the number of objects at each location in advance of any exploration by means of a set of (independent) prior distributions. The problem of determining a Bayes rule can be formulated as a multiarmed bandit to which the theory of Gittins indexation may be applied. See Gittins and Jones (1974) and Gittins (1989). This theory indicates that with each location may be associated a calibrating index to tAddress for correspondence: Department of Mathematics and Statistics, University of Newcastle upon Tyne, Newcastle upon Tyne, NEI 7RU, UK. © 1995 Royal Statistical Society 0035-9246/95/57705