MAHALANOBIS KERNEL BASED ON PROBABILISTIC PRINCIPAL COMPONENT M. Fauvel ⋆ , A. Villa †,⋄ , J. Chanussot † and J. A. Benediktsson ⋄ ⋆ INRA-DYNAFOR and University of Toulouse, France † GIPSA-lab, Grenoble Institute of Technology, France ⋄ Faculty of Electrical and Computer Engineering, University of Iceland, Iceland ABSTRACT A kernel adapted to the spectral dimension of hyper- spectral images is proposed in this paper. A distance based on a statistical cluster model is used to construct a radial kernel. This class speciﬁc kernel realizes a com- promise between a conventional Gaussian kernel and a Gaussian kernel on the ﬁrst principal components of the considered class. An automatic gradient optimization is used to select the optimal hyperparameters. Experimen- tal results on a real hyperspectral image show the kernel is eﬀective compared to the conventional Gaussian ker- nel. Furthermoren the proposed kernel is less sensitive to one hyperparameter compared to the Gaussian kernel applied on the ﬁrst principal components of the data. Index Terms— Hyperspectral image, Mahalanobis kernel, probabilistic principal component analysis, sup- port vector machine, kernel methods. 1. INTRODUCTION The Gaussian kernel is assuredly one of the most used kernels in kernel learning algorithms for remote sensing applications [1]. It is based on the Euclidean distance between two samples (or spectra), x and z, in the input space R d : k g (x, z) = exp  − (x − z) t (x − z) 2σ 2  (1) where σ is a hyperparameter that, roughly speaking, con- trols how two samples are considered as close or similar in R d . For high dimensional data, such as hyperspec- tral images, it is known that the conventional Euclidean distance may suﬀer from the dimension [2]. An alterna- tive kernel, which is based on the Mahalanobis distance between two samples, has been proposed in [3] and in particular in [4] for the classiﬁcation of remote sensing images: d m (x, z)=(x − z) t Σ −1 (x − z). (2) where Σ is the covariance matrix of either the whole training samples or the samples of the considered class. However, computing the inverse of Σ for hyperspectral images is diﬃcult and thus regularization is needed [5]. In [6], the inversion of the covariance matrix Σ was reg- ularized to make it well conditioned even in high dimen- sional space and allow to deﬁne eq. (2) for each class separately. The regularization method was based on the probabilistic principal component analysis (PPCA) [7], which assumed that the observed d variables x are a linear combination of p unobserved variables s, p being lower than d: x = Ws + μ + ε (3) with s ∼N (0, I p ) and ε ∼N (0,ε 2 I d )(N (0, I p ) being the normal distribution of dimension p with zero mean 0 and identity covariance matrix I p ). It comes that Σ has the following expression: Σ = p  i=1 (λ 2 i + ε 2 )u i u t i + ε 2 d  i=p+1 u i u t i (4) where λ i is the i th singular value of W and u i its cor- responding left-singular vector. All the parameters ( ˆ λ, ˆ u and ˆ ε) can be estimated from the sample covariance matrix ˆ Σ = 1 n n  i=1 (x i − ¯ x)(x i − ¯ x) t and the intrinsic dimension ˆ p is estimated using the Bayesian Information Criterion (BIC) [8]. It follows that the inverse can be computed explicitly: ˆ Σ −1 = ˆ p  i=1 1 ( ˆ λ 2 i +ˆ ε 2 ) ˆ u i ˆ u t i    Qs +ˆ ε −2 d  i=ˆ p+1 ˆ u i ˆ u t i    Qn (5) This statistical model can be understood equivalently by a geometrical assumption: The data, with some addi- tional white noise, belong to a cluster that lives in a