Fuzzy Labeled Soft Nearest Neighbor Classification with Relevance Learning Thomas Villmann University Leipzig, Clinic for Psychotherapy 04107 Leipzig, Germany villmann@informatik.uni-leipzig.de Frank-Michael Schleif University Leipzig, Dept. of Math. and C.S. 04109 Leipzig, Germany schleif@informatik.uni-leipzig.de Barbara Hammer Clausthal University of Technology, Dept. of Math. and C.S. 38678 Clausthal-Zellerfeld, Germany hammer@in.tu-clausthal.de Abstract We extend soft nearest neighbor classification to fuzzy classification with adaptive class labels. The adaptation follows a gradient descent on a cost function. Further, it is applicable for general distance measures, in particular task specific choices and relevance learning for metric adap- tation can be done. The performance of the algorithm is shown on synthetical as well as on real life data taken from proteomic research. keywords: fuzzy classification, LVQ, relevance learning 1 Introduction KOHONEN’ S Learning Vector Quantization (LVQ) be- longs to the class of supervised learning algorithms for near- est prototype classification (NPC) [8]. NPC relies on a set of prototype vectors (also called codebook vectors) labeled according to the given data classes. The prototypes loca- tions are adapted by the algorithm such that they represent their respective classes. Such, NPC is a local classifica- tion method in the sense that the classification boundaries are approximated locally by the prototypes. The classifica- tion provided by the trained LVQ is crisp, i.e., an unknown data point is uniquely assigned to a prototype based on their similarity, which itself is uniquely related to a class. Several extensions exist to improve the basic scheme. Recently a new method, Soft Nearest Prototype Classi- fication (SNPC), has been proposed by SEO ET AL. [11] in which soft assignments of the data vectors for the proto- types are introduced. The determination of soft assignments are based on a Gaussian mixture approach. However, the class labels of the prototype vectors remain crisp and they are fixed apriori as usual in LVQ. Generally, the crisp (unique) labeling in LVQ-methods has the disadvantage that the initial prototype labeling may be not sufficient for the real class label distribution of the data points in the data space. Data with different class la- bels may be assigned to the same prototype (misclassifi- cations) because the classes are overlapping. A solution could be a post-labeling of the prototype labels according to the data statistics given by all data vectors represented by the considered prototype leading to a fuzzy labeling [13]. However, this method is not appropriate for online learn- ing, since crisp prototype label information is essential for all classical LVQ-learning schemes to determine correct and incorrect classification during prototype adaptation. In this article we introduce a dynamic fuzzy labeling of prototypes. This has the consequence that the required in- formation of correct or incorrect classification during learn- ing is lost and, hence, a new learning scheme has to be es- tablished. Based on SNPC we derive an adaptation scheme for labels and prototypes such that adaptive fuzzy labeling can be achieved. We apply the new algorithm to profiling of mass spec- trometic data in cancer research. During the last years pro- teomic 1 profiling based on mass spectrometry (MS) became an important tool for studying cancer at the protein and pep- tide level in a high throughput manner. Additionally, MS based serum profiling is a potential diagnostic tool to distin- guish patients with cancer from normal subjects. The under- lying algorithms for classification of the mass spectrometric data are one crucial point to obtain valid and competitive results. Usually one is interested in finding decision bound- aries near to the optimal Bayesian decision. Especially, for 1 Proteome - is an ensemble of protein forms expressed in a biological sample at a given point in time [1].