Supervised Neural Gas for Classiﬁcation of Functional Data and its Application to the Analysis of Clinical Proteom Spectra Frank-Michael Schleif 1 , Thomas Villmann 1 and Barbara Hammer 2 (1) University Leipzig, Dept. of Medicine, 04107 Leipzig, Germany (2) TU-Clausthal, Dept. of Math. & C.S., 38678 Clausthal-Zellerfeld, Germany {schleif,villmann}@informatik.uni-leipzig.de,+49(0)3419718896 {hammer}@in.tu-clausthal.de,+49(0)53237271[86][39] Abstract. The analysis of functional data, is a common task in bioinfor- matics. Spectral data as obtained from mass spectrometric measurements in clinical proteomics are such functional data leading to new challenges for an appropriate analysis. Here we focus on the determination of classi- ﬁcation models for such data. In general the available approaches for this task initially transform the spectra into a vector space followed by train- ing a classiﬁer. Hereby the functional nature of the data is typically lost, which may lead to suboptimal classiﬁer models. Taking this into account a wavelet encoding is applied onto the spectral data leading to a compact functional representation. Further the Supervised Neural Gas classiﬁer is extended by a functional metric. This allows the classiﬁer to utilize the functional nature of the data in the modeling process. The presented method is applied to clinical proteom data showing good results. Key words: supervised neural gas, functional, data analysis, clinical proteomics, wavelet analysis, spectra preprocessing 1 Introduction Applications of mass spectrometry (ms) in clinical proteomics have gained tremendous visibility in the scientiﬁc and clinical community [10, 4]. One major objective is the search for potential classiﬁcation models for cancer studies. For this purpose, eﬃcient analysis and visualization of large high-dimensional data sets derived from patient cohorts is crucial. Additionally, it is necessary to apply statistical analysis and pattern matching algorithms to attain validated signal patterns. Here we focus on the determination of classiﬁcation models discriminat- ing between multiple classes. A powerful tool to achieve such models with high generalization abilities is available with the prototype based Supervised Neu- ral Gas algorithm (SNG) [11]. Like all nearest prototype classiﬁer algorithms, SNG heavily relies on the metric d, usually the standard euclidean metric. For high-dimensional data as they occur in proteomic patterns, this choice is not adequate due to two reasons, ﬁrst the functional nature of the data should be kept as far as possible, second the noise present in the data set accumulates and likely disrupts the classiﬁcation taking a standard euclidean approach. Thus, a functional representation of the data with respect to the used metric and a weighting or pruning of especially (priory not known) irrelevant function parts of the inputs, would be desirable. Therefore we focus on a functional distance measure as recently proposed in [6] which will be referred as functional met- ric. Further feature selection is applied based on a statistical pre-analysis of the