Pattern Reco(4nition Vol. 12, pp. 405-413.
Pergamon Press Ltd 1980. Printed in Great Britain.
© Pattern Recognition Society.
0031 3203/80/1201 0405 $02.00/I1
ERROR RATE ESTIMATION ON THE BASIS OF
POSTERIOR PROBABILITIES
S. GANESALINGAM and G. J. MCLACHLAN*
Department of Mathematics, University of Queensland, St. Lucia 4067, Australia
(Received 6 July 1979; in revised form 20 December 1979; received for publication 11 March 1980)
Abstract - The so-called posterior probability estimator, e, formed by averaging the minimum of the
posterior probabilities over a set of initial or additional observations (which need not be classified) is
considered in the context of estimating the overall actual error rate for the linear discriminant function
appropriate for two multivariate normal populations with a common covarianee matrix. The bias of e is
examined by deriving asymptotic approximations under three differmat models, the normal, logistic, and
mixture models. The properties of e are investigated further by a sexies of simulation experiments for the
logistic and mixture models for which there are few other availalale estimators.
Linear discriminant function Posterior probabilities Actual error rate Normal
Logistic Mixture models
1. INTRODUCTION
Suppose that an object is to be allocated to one of two
possible populations, I'I~ or rlj, on the basis of an
observation vector x consisting of p characteristics
associated with the object. Letf~(x) denote the multi-
variate density function of x in II~, which has a prior
probability 7h(i -- 1, 2). The optimal rule of allocation,
R, assigns x to tit or l'I2 according as 0 j (x) is greater or
less than 0.5, where 0t(x) is the posterior probability
that x belongs to 1-I t and is given by
Ot(x ) = T/(1 + z), (I.I)
where
= 7ttfl(x)/x2f2(x),
and 02(x) = 1 - 0dx). The rule is optimal in the sense
that it minimizes the overall error rate. In practice, the
density functions are either unknown or, if their form is
assumed known, then their parameters require esti-
mation. The usual procedure is to form the sample
allocation rule, i~, defined by replacing ft(x) with an
estimatc,~(x), based on an initial sample of size n = n t
+ n2 with n~ observations belonging to FIj(f = 1,2).
The actual error rates are denoted by Pl(/~), which is
the conditional probability that i~ misallocates a
randomly chosen member of 1-I~(conditional on the
initial sample estimates of the population parameters).
The overall actual error rate, P(R), is given by
P(f~) = nlP~(f~) + njPj(/~). (1.2)
The problem of estimating the error rates has been
considered by many authors, too numerous to re-
ference completely here; see Lachenbruch ~t~and Tou-
ssaint (~1 for extensive bibliographies. Some more re-
cent references are Efron, (3~ Glick,¢4' 5~ Goldstein and
Dillon,(~ Lissack and Fu, tl~ McLachlan, 17-9t and
Moore et al. (i°~ The estimators designed specifically
for normal populations with a common covariance
matrix include the plug-in estimator, Q = G(- ½D),
where G is the standard normal distribution and D is
the estimated Mahalanobis distance betwt~n the two
populations, and its form Corrected for bias, Qu, which
reduces the bias to the third order only. (tt" t2~ There
are also the so-called O$, Q, OS, and 0 methods of
estimation due to Lachenbruch and Mickey. tt3~
In an exceUent account of the estimation problem
Glick ~5~ devoted particular attention to the counting
estimators, including the apparent error rate or resub-
stitution method ~t*~ and its commonly used version
modified according to a leafing-one-out process, the U
method of Lachenbruch and Mickey:t 3) Although the
latter method is robust and nearly eliminates the bias
associated with the apparent error rate, Glick ~5~ has
pointed out that it may have a large variance for small
n due to the fact that the observations are assigned
outright to a population during the estimation process.
An estimator which overcomes this problem is the so-
called posterior probability estimator considered by
Fukunaga and Kessell,tts" t~ Lissack and Fu ttT~ and
Moore et al." o) With this estimator each observation is
not assigned outright to a population; rather, it is
given an estimated probability of membership of each
population. In the present context this estimator takes
the form
e----- : min{0t(zj), 02(zj)}/n, (1.3)
j=l
* For correspondence contact Dr. McLachlan. where zi,...,z~ denote n observations drawn from a
405