On Visualization and Aggregation of Nearest Neighbor Classifiers Anil K. Ghosh, Probal Chaudhuri, and C.A. Murthy Abstract—Nearest neighbor classification is one of the simplest and most popular methods for statistical pattern recognition. A major issue in k-nearest neighbor classification is how to find an optimal value of the neighborhood parameter k. In practice, this value is generally estimated by the method of cross-validation. However, the ideal value of k in a classification problem not only depends on the entire data set, but also on the specific observation to be classified. Instead of using any single value of k, this paper studies results for a finite sequence of classifiers indexed by k. Along with the usual posterior probability estimates, a new measure, called the Bayesian measure of strength, is proposed and investigated in this paper as a measure of evidence for different classes. The results of these classifiers and their corresponding estimated misclassification probabilities are visually displayed using shaded strips. These plots provide an effective visualization of the evidence in favor of different classes when a given data point is to be classified. We also propose a simple weighted averaging technique that aggregates the results of different nearest neighbor classifiers to arrive at the final decision. Based on the analysis of several benchmark data sets, the proposed method is found to be better than using a single value of k. Index Terms—Bayesian strength function, misclassification rates, multiscale visualization, neighborhood parameter, posterior probability, prior distribution, weighted averaging. æ 1 INTRODUCTION I N supervised classification problems, we usually have a training sample of the form x n ;c n Þ; n ¼ 1; 2; ... ;Ng, where the x n s are the measurement vectors and the c n s are the class labels of the training sample observations. Based on this available training sample, one forms a finite partition X 1 ; X 2 ; ... ; X J of the sample space X such that an observa- tion x is to be classified to the jth population if x 2X j . There are some well-known parametric [17], [31] and nonpara- metric [37], [11], [23] methods in the existing literature for finding such partitions. Nearest neighbor technique [12], [8], [10], [9] is one of the most popular nonparametric methods for this purpose. In order to classify an observation by the k-nearest neighbor method (k-NN), we assume the posterior probability of a specific class to be constant over a small neighborhood around that observation. Generally, a closed ball of radius r k is taken as this neighborhood, where r k is the distance between the observation and its kth nearest neighbor. We classify an observation to the class which has the maximum number of representatives in this neighborhood. The parameter k, which determines the size of this neighborhood, can be viewed as a measure of the smoothness of the posterior probability estimates and, in the future, we will refer to it as the neighborhood parameter. A discussion on the bias and the variance of the posterior probability estimates for different k is available in [16], [14], [11]. The performance of the nearest neighbor classification rule depends heavily on the value of this neighborhood parameter k. Existing theoretical results [29], [8], [16], suggest that k should depend on the training sample size N and it should vary with N in such a way that k !1 and k=N ! 0 as N !1. In practice, the optimal value of k depends on the available training sample observations and one generally uses resampling techniques like cross- validation [28], [40] to determine it. However, the optimal value of k is case specific and it depends on the observation to be classified in addition to depending on the competing population distributions. Therefore, in a classification pro- blem, instead of fixing the value of k, it may be of more use to look at the results for different neighborhood parameters and then combining them to come up with the final decision. In this paper, we study classification using different neighborhood parameters simultaneously. Broadly speak- ing, this paper has two major components: In Section 2, we propose some discrimination measures to study the strength of the classification results for different k and develop a device for visual presentation of these results using shaded strips. The resulting plots provide a visual comparison between the strength of evidence in favor of different classes for a specific data point in the sample space. They are useful, especially in higher dimensional spaces, to visualize the distribution of data points in the training sample from different populations in neighborhoods of varying sizes of a test case. It may also help to make the final decision about classification. Such a visual approach in discriminant analysis is also available in [18] and [19], where the authors used a range of values for bandwidth parameters of the kernel density estimates of different competing classes. Earlier authors [6], [20] used similar ideas to visualize significant features in univariate and bivariate function estimation problems. The other major components of the paper are introduced in Section 3 and concern the aggregation of different nearest neighbor classifiers. Here, we use a weighted averaging 1592 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 27, NO. 10, OCTOBER 2005 . A.K. Ghosh and P. Chaudhuri are with the Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata-700 108, India. E-mail: anilkghosh@rediffmail.com, probal@isical.ac.in. . C.A. Murthy is with the Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata-700 108, India. E-mail: murthy@isical.ac.in. Manuscript received 26 Feb. 2004; revised 17 Jan. 2005; accepted 31 Jan. 2005; published online 11 Aug. 2005. Recommended for acceptance by T. Tan. For information on obtaining reprints of this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number TPAMI-0106-0204. 0162-8828/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society