Kernel Discriminant Analysis for Positive Definite and Indefinite Kernels El _ zbieta Pekalska and Bernard Haasdonk Abstract—Kernel methods are a class of well established and successful algorithms for pattern analysis due to their mathematical elegance and good performance. Numerous nonlinear extensions of pattern recognition techniques have been proposed so far based on the so-called kernel trick. The objective of this paper is twofold. First, we derive an additional kernel tool that is still missing, namely kernel quadratic discriminant (KQD). We discuss different formulations of KQD based on the regularized kernel Mahalanobis distance in both complete and class-related subspaces. Second, we propose suitable extensions of kernel linear and quadratic discriminants to indefinite kernels. We provide classifiers that are applicable to kernels defined by any symmetric similarity measure. This is important in practice because problem-suited proximity measures often violate the requirement of positive definiteness. As in the traditional case, KQD can be advantageous for data with unequal class spreads in the kernel-induced spaces, which cannot be well separated by a linear discriminant. We illustrate this on artificial and real data for both positive definite and indefinite kernels. Index Terms—Machine learning, pattern recognition, kernel methods, indefinite kernels, discriminant analysis. Ç 1 INTRODUCTION K ERNEL methods are powerful statistical learning techni- ques [38], [36], widely applied to various learning scenarios due to their flexibility and good performance. A kernel is a (conditionally) positive definite (pd) function kðx;x 0 Þ of two variables x and x 0 , and interpreted as a generalized inner product, hence natural similarity, in a reproducing kernel Hilbert space H induced by k [33], [40]. Due to the reproducing property of k, kernel-based classifiers are indirectly built in H and often expressed as linear combinations of kernel values. Many traditional learning methods have been proposed so far in their kernel-based formulations. These include Support Vector Machines (SVM), kernel PCA, kernel Fisher discriminant (KFD), kernel k-means, and so on [36]. An additional tool that is still missing within the set of simple approaches is the kernel quadratic discriminant (KQD). In this paper, we derive KQD as a natural extension of the quadratic discriminant in a Euclidean space. Three variants are considered in either full or class- related kernel-induced subspaces. Although traditional kernel methods have now been applied to general nonvectorial data descriptions, such as strings, bags of words, graphs, shapes, probability models [35], [36], the class of permissible kernels is often, and frequently wrongly, considered to be limited due to their requirement of being positive definite. In practice, however, many non-pd similarity measures arise, e.g., when invariance or robustness is incorporated into the measure [37], [20], [13]. Further reasons may include suboptimal optimization procedures for measure derivation [28], partial projections or occlusions [20], and context-dependent alignments or object comparisons [6], [30]. Naturally, indefinite (dis)sim- ilarities arise from non-euclidean or nonmetric dissimilari- ties, such as modified Hausdorff distances [6], or non-pd similarities, such as Kullback-Leibler divergence between probability distributions. Consequently, there is a practical need to handle these measures properly. In the case of metric dissimilarity measures, these can be embedded in Banach spaces where learning algorithms such as large margin classifiers can be applied [39], [16], [4]. Although these techniques provide alternatives to certain kernel methods for metric data, more general approaches are needed. While many researchers choose to regularize non-pd kernels to make them pd, a natural extension of Mercer kernels leads to indefinite or Krein kernels [2], [25], [21], [11], [26], or dyadic kernels [18]. Both are examples of proximity representations, i.e., matrices whose elements encode de- grees of similarity between pairs of objects and optimized prototypes [26]. Therefore, it is of high interest to develop and investigate methods that work with indefinite kernels. And indeed, an additional contribution of this paper is a sound underpinning of the approaches which extend kernel linear and quadratic discriminants to deal with indefinite kernels. Experiments on toy and real-world data show good perfor- mance of the KQD methods for both positive definite and indefinite kernels. The paper is organized as follows: Section 2 starts with preliminaries on kernels. Section 3 presents the indefinite kernel Fisher discriminant analysis. Section 4 is the main part that introduces different formulations of KQD analysis for both positive definite and indefinite kernels. Section 5 focuses on an experimental study illustrating the perfor- mance of kernel discriminant analysis on toy and real-world IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 6, JUNE 2009 1017 . E. Pekalska is with the School of Computer Science, University of Manchester, Oxford Road, M13 9PL Manchester, UK. E-mail: pekalska@cs.man.ac.uk. . B. Haasdonk is with the Institute of Applied Analysis and Numerical Simulation, University of Stuttgart, Pfaffenwaldring 57, D-70569 Stuttgart, Germany. E-mail: haasdonk@mathematik.uni-stuttgart.de. Manuscript received 10 July 2008; revised 13 Aug. 2008; accepted 14 Nov. 2008; published online 2 Dec. 2008. Recommended for acceptance by L. Bottou. For information on obtaining reprints of this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number TPAMI-2008-07-0411. Digital Object Identifier no. 10.1109/TPAMI.2008.290. 0162-8828/09/$25.00 ß 2009 IEEE Published by the IEEE Computer Society Authorized licensed use limited to: Technische Universiteit Delft. Downloaded on May 15, 2009 at 05:48 from IEEE Xplore. Restrictions apply.