Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California Oklahoma State University University of California Riverside, CA 92521 Stillwater, OK 74078 Riverside, CA 92521 carlotta@cs.ucr.edu jpeng@cs.okstate.edu dg@cs.ucr.edu Abstract Nearest neighbor classification assumes locally con- stant class conditional probabilities. This assumption be- comes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor clas- sification method to try to minimize bias. We use a Chi- squared distance analysis to compute a flexible metric for producing neighborhoods that are highly adaptive to query locations. Neighborhoods are elongated along less rel- evant feature dimensions and constricted along most in- fluential ones. As a result, the class conditional prob- abilities tend to be smoother in the modified neighbor- hoods, whereby better classification performance can be achieved. The efficacy of our method is validated and com- pared against other techniques using a variety of simulated and real world data. 1 Introduction In a classification problem, we are given classes and training observations. The training observations consist of feature measurements and the known class labels, , . The goal is to predict the class label of a given query . The nearest neighbor classification method [3, 8, 9, 10] is a simple and appealing approach to this problem: it finds the nearest neighbors of in the training set, and then predicts the class label of as the most frequent one occurring in the neighbors. Such a method produces continuous and overlapping, rather than fixed, neighbor- hoods and uses a different neighborhood for each individ- ual query so that all points in the neighborhood are close to the query. In addition, it has been shown [4, 5] that the one nearest neighbor rule has asymptotic error rate that is at most twice the Bayes error rate, independent of the dis- tance metric used. This research was supported by NSF IIS-9907477 and the US Dept. of Defense. The nearest neighbor rule becomes less appealing with finite training samples, however. This is due to the curse- of-dimensionality [2]. Severe bias can be introduced in the nearest neighbor rule in a high dimensional input feature space with finite samples. As such, the choice of a dis- tance measure becomes crucial in determining the outcome of nearest neighbor classification [6, 7, 9]. The commonly used Euclidean distance measure, while simple computa- tionally, implies that the input space is isotropic. However, the assumption for isotropy is often invalid and generally undesirable in many practical applications. This implies that distance computation does not vary with equal strength in all directions in the feature space emanating from the input query. Capturing such information, therefore, is of great importance to any classification procedure in high di- mensional settings. In this paper we propose an adaptive nearest neigh- bor classification method to try to minimize bias in high dimensions. We estimate a flexible metric for comput- ing neighborhoods based on Chi-squared distance analy- sis. The resulting neighborhoods are highly adaptive to query locations. Moreover, the neighborhoods are elon- gated along less relevant feature dimensions and con- stricted along most influential ones. As a result, the class conditional probabilities tend to be constant in the modified neighborhoods, whereby better classification performance can be obtained. 2 Local Feature Relevance Measure Kernel methods are based on the assumption of smooth- ness of the target functions, which translates to locally con- stant class posterior probabilities for a classification prob- lem. This assumption, however, becomes invalid for any fixed distance metric when the input observation ap- proaches class boundaries. In the following, we describe a nearest neighbor classification technique that is capable of producing a local neighborhood in which the posterior probabilities are approximately constant, and that is highly adaptive to query locations.