Pattern Reco(4nition Vol. 12, pp. 405-413. Pergamon Press Ltd 1980. Printed in Great Britain. © Pattern Recognition Society. 0031 3203/80/1201 0405 $02.00/I1 ERROR RATE ESTIMATION ON THE BASIS OF POSTERIOR PROBABILITIES S. GANESALINGAM and G. J. MCLACHLAN* Department of Mathematics, University of Queensland, St. Lucia 4067, Australia (Received 6 July 1979; in revised form 20 December 1979; received for publication 11 March 1980) Abstract - The so-called posterior probability estimator, e, formed by averaging the minimum of the posterior probabilities over a set of initial or additional observations (which need not be classified) is considered in the context of estimating the overall actual error rate for the linear discriminant function appropriate for two multivariate normal populations with a common covarianee matrix. The bias of e is examined by deriving asymptotic approximations under three differmat models, the normal, logistic, and mixture models. The properties of e are investigated further by a sexies of simulation experiments for the logistic and mixture models for which there are few other availalale estimators. Linear discriminant function Posterior probabilities Actual error rate Normal Logistic Mixture models 1. INTRODUCTION Suppose that an object is to be allocated to one of two possible populations, I'I~ or rlj, on the basis of an observation vector x consisting of p characteristics associated with the object. Letf~(x) denote the multi- variate density function of x in II~, which has a prior probability 7h(i -- 1, 2). The optimal rule of allocation, R, assigns x to tit or l'I2 according as 0 j (x) is greater or less than 0.5, where 0t(x) is the posterior probability that x belongs to 1-I t and is given by Ot(x ) = T/(1 + z), (I.I) where = 7ttfl(x)/x2f2(x), and 02(x) = 1 - 0dx). The rule is optimal in the sense that it minimizes the overall error rate. In practice, the density functions are either unknown or, if their form is assumed known, then their parameters require esti- mation. The usual procedure is to form the sample allocation rule, i~, defined by replacing ft(x) with an estimatc,~(x), based on an initial sample of size n = n t + n2 with n~ observations belonging to FIj(f = 1,2). The actual error rates are denoted by Pl(/~), which is the conditional probability that i~ misallocates a randomly chosen member of 1-I~(conditional on the initial sample estimates of the population parameters). The overall actual error rate, P(R), is given by P(f~) = nlP~(f~) + njPj(/~). (1.2) The problem of estimating the error rates has been considered by many authors, too numerous to re- ference completely here; see Lachenbruch ~t~and Tou- ssaint (~1 for extensive bibliographies. Some more re- cent references are Efron, (3~ Glick,¢4' 5~ Goldstein and Dillon,(~ Lissack and Fu, tl~ McLachlan, 17-9t and Moore et al. (i°~ The estimators designed specifically for normal populations with a common covariance matrix include the plug-in estimator, Q = G(- ½D), where G is the standard normal distribution and D is the estimated Mahalanobis distance betwt~n the two populations, and its form Corrected for bias, Qu, which reduces the bias to the third order only. (tt" t2~ There are also the so-called O$, Q, OS, and 0 methods of estimation due to Lachenbruch and Mickey. tt3~ In an exceUent account of the estimation problem Glick ~5~ devoted particular attention to the counting estimators, including the apparent error rate or resub- stitution method ~t*~ and its commonly used version modified according to a leafing-one-out process, the U method of Lachenbruch and Mickey:t 3) Although the latter method is robust and nearly eliminates the bias associated with the apparent error rate, Glick ~5~ has pointed out that it may have a large variance for small n due to the fact that the observations are assigned outright to a population during the estimation process. An estimator which overcomes this problem is the so- called posterior probability estimator considered by Fukunaga and Kessell,tts" t~ Lissack and Fu ttT~ and Moore et al." o) With this estimator each observation is not assigned outright to a population; rather, it is given an estimated probability of membership of each population. In the present context this estimator takes the form e----- : min{0t(zj), 02(zj)}/n, (1.3) j=l * For correspondence contact Dr. McLachlan. where zi,...,z~ denote n observations drawn from a 405