MAHALANOBIS KERNEL BASED ON PROBABILISTIC PRINCIPAL COMPONENT M. Fauvel , A. Villa , , J. Chanussot and J. A. Benediktsson INRA-DYNAFOR and University of Toulouse, France GIPSA-lab, Grenoble Institute of Technology, France Faculty of Electrical and Computer Engineering, University of Iceland, Iceland ABSTRACT A kernel adapted to the spectral dimension of hyper- spectral images is proposed in this paper. A distance based on a statistical cluster model is used to construct a radial kernel. This class specific kernel realizes a com- promise between a conventional Gaussian kernel and a Gaussian kernel on the first principal components of the considered class. An automatic gradient optimization is used to select the optimal hyperparameters. Experimen- tal results on a real hyperspectral image show the kernel is effective compared to the conventional Gaussian ker- nel. Furthermoren the proposed kernel is less sensitive to one hyperparameter compared to the Gaussian kernel applied on the first principal components of the data. Index TermsHyperspectral image, Mahalanobis kernel, probabilistic principal component analysis, sup- port vector machine, kernel methods. 1. INTRODUCTION The Gaussian kernel is assuredly one of the most used kernels in kernel learning algorithms for remote sensing applications [1]. It is based on the Euclidean distance between two samples (or spectra), x and z, in the input space R d : k g (x, z) = exp (x z) t (x z) 2σ 2 (1) where σ is a hyperparameter that, roughly speaking, con- trols how two samples are considered as close or similar in R d . For high dimensional data, such as hyperspec- tral images, it is known that the conventional Euclidean distance may suffer from the dimension [2]. An alterna- tive kernel, which is based on the Mahalanobis distance between two samples, has been proposed in [3] and in particular in [4] for the classification of remote sensing images: d m (x, z)=(x z) t Σ 1 (x z). (2) where Σ is the covariance matrix of either the whole training samples or the samples of the considered class. However, computing the inverse of Σ for hyperspectral images is difficult and thus regularization is needed [5]. In [6], the inversion of the covariance matrix Σ was reg- ularized to make it well conditioned even in high dimen- sional space and allow to define eq. (2) for each class separately. The regularization method was based on the probabilistic principal component analysis (PPCA) [7], which assumed that the observed d variables x are a linear combination of p unobserved variables s, p being lower than d: x = Ws + μ + ε (3) with s ∼N (0, I p ) and ε ∼N (0 2 I d )(N (0, I p ) being the normal distribution of dimension p with zero mean 0 and identity covariance matrix I p ). It comes that Σ has the following expression: Σ = p i=1 (λ 2 i + ε 2 )u i u t i + ε 2 d i=p+1 u i u t i (4) where λ i is the i th singular value of W and u i its cor- responding left-singular vector. All the parameters ( ˆ λ, ˆ u and ˆ ε) can be estimated from the sample covariance matrix ˆ Σ = 1 n n i=1 (x i ¯ x)(x i ¯ x) t and the intrinsic dimension ˆ p is estimated using the Bayesian Information Criterion (BIC) [8]. It follows that the inverse can be computed explicitly: ˆ Σ 1 = ˆ p i=1 1 ( ˆ λ 2 i ε 2 ) ˆ u i ˆ u t i  Qs ε 2 d ip+1 ˆ u i ˆ u t i  Qn (5) This statistical model can be understood equivalently by a geometrical assumption: The data, with some addi- tional white noise, belong to a cluster that lives in a