A BOOTSTRAP INTERVAL ESTIMATOR FOR BAYES’ CLASSIFICATION ERROR Chad M. Hawes and Carey E. Priebe Johns Hopkins University Department of Applied Mathematics and Statistics Baltimore, MD 21218 ABSTRACT Using a finite-length training set, we propose a new estima- tion approach suitable as an interval estimate of the Bayes- optimal classification error L ∗ . We arrive at this estimate by constructing bootstrap training sets of varying size from the fixed, finite-length original training set. We assume a power- law decay curve for the unconditional error rate as a function of training sample size n, and fit the bootstrap estimated un- conditional error rate curve to this power-law form. Using a result from Devijver, we do this twice, once for the k near- est neighbor (kNN) rule to provide an upper bound on L ∗ and again for Hellman’s (k,k ′ ) nearest neighbor rule with re- ject option, which gives a lower bound for L ∗ . The result is an asymptotic interval estimate of L ∗ from a finite-length training sample. We apply our estimator to two classification examples, obtaining Bayes’ error estimates. Index Terms— Error rate estimation, Bayes’ error, clas- sification, bootstrap 1. INTRODUCTION We propose a new error rate estimation approach suitable as an interval estimate of the Bayes-optimal probability of misclassification, from a finite sample. Given an ob- served, unlabeled random (feature) vector X to be classi- fied, and independent, identically distributed training data D n = {(X i ,Y i )} n i=1 with (X i ,Y i ):Ω → R d ×{0, 1} drawn from an unknown joint distribution F XY , the pattern recogni- tion problem is to select a classification rule g : R d →{0, 1} to predict the unknown class label Y with minimal mis- classification error, where (X, Y ) is distributed F XY and independent of D n . The performance of any classification rule cannot improve on the Bayes’ optimal error rate given by L ∗ ≡ inf g:R d →{0,1} P {g(X) = Y }. Following the notational convention of [1], let us denote the finite sample conditional probability of error for the kNN-rule by L n (k)= P {g kNN (X) = Y |D n } and the unconditional error rate by ¯ L n (k)= E[L n (k)] = P {g kNN (X) = Y }. We denote the asymptotic conditional and unconditional error rates by L ∞ (k) = lim n→∞ L n (k) and ¯ L ∞ (k) = lim n→∞ ¯ L n (k), respectively. Devijver [2] derives asymptotic upper and lower bounds on the Bayes-optimal classification error L ∗ using the k-nearest-neighbor (kNN) rule originally developed by Fix and Hodges [3] and the (k,k ′ ) NN-rule with reject option developed by Hellman [4]. Then in our notation, Devijver’s bounds are ¯ L ∞ (k,k ′ ) ≤ L ∗ ≤ ¯ L ∞ (k), (1) where ¯ L ∞ (k,k ′ ) is the asymptotic unconditional error rate of Hellman’s (k,k ′ ) NN-rule, where k ′ satisfies ⌈ k+1 2 ⌉ <k ′ ≤ k. All asymptotics here are for fixed k and k ′ as the training sample size n →∞. This is great, we have tight bounds on the Bayes’ optimal error; unfortunately these are asymptotic bounds. In practice we never have at our disposal an infinite training sample, so that getting our hands on an estimate of ¯ L ∞ (k,k ′ ) and ¯ L ∞ (k) is non-trivial. In practice, using Monte Carlo simulations, we often con- struct multiple simulated data sets for different sample sizes n, evaluate an estimate of the statistical parameter of inter- est (e.g. detector false alarm rate in additive or multiplicative non-Gaussian noise), and construct a plot of the estimated sta- tistical parameter as a function of the training sample size n. The resulting curve can then be used to interpolate the ex- pected system performance at desired sample sizes or to ex- trapolate the system performance if a parametric form for the estimated curve (e.g. linear, exponential) is recognized. We extend Devijver’s large sample L ∗ kNN-rule bound- ing approach, using a finite sample, by this idea of estimat- ing the error rate for various sample sizes. Our basic ap- proach is to draw bootstrap training samples for different sam- ple sizes n from the fixed, finite available training set D N , construct an estimated error rate decay curve as a function of n, and fit a parametric power-law form to the resulting curve. This process is performed for the standard kNN-rule to yield an estimate ˆ ¯ L ∞ (k) of ¯ L ∞ (k) and then repeated using Hell- man’s (k,k ′ ) NN-rule with reject option, giving an estimate ˆ ¯ L ∞ (k,k ′ ) for ¯ L ∞ (k,k ′ ). The result is an asymptotic interval estimator ˆ ¯ L ∞ (k,k ′ ), ˆ ¯ L ∞ (k) for the Bayes-optimal classi- fication error using a finite training sample size.