IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 47, NO. 4, APRIL 2009 1139 Kernel Nonparametric Weighted Feature Extraction for Hyperspectral Image Classification Bor-Chen Kuo, Cheng-Hsuan Li, and Jinn-Min Yang Abstract—In recent years, many studies show that kernel meth- ods are computationally efficient, robust, and stable for pattern analysis. Many kernel-based classifiers were designed and applied to classify remote-sensed data, and some results show that kernel- based classifiers have satisfying performances. Many studies about hyperspectral image classification also show that nonparametric weighted feature extraction (NWFE) is a powerful tool for extract- ing hyperspectral image features. However, NWFE is still based on linear transformation. In this paper, the kernel method is applied to extend NWFE to kernel-based NWFE (KNWFE). The new KNWFE possesses the advantages of both linear and nonlinear transformation, and the experimental results show that KNWFE outperforms NWFE, decision-boundary feature extraction, inde- pendent component analysis, kernel-based principal component analysis, and generalized discriminant analysis. Index Terms—Feature extraction, image classification. I. I NTRODUCTION I N RECENT years, many studies [1]–[7] show that kernel methods are computationally efficient, robust, and stable for pattern analysis. Many kernel-based classifiers were designed and applied to classify remote-sensed data, and some results show that kernel-based classifiers have satisfying performances [5]–[7]. The main idea of kernel methods is to map the input data from the original space to a convenient feature space by a nonlinear mapping where inner products in the feature space can be computed by a kernel function without knowing the nonlinear mapping explicitly, and linear relations are sought among the images of the data items in the feature space. Some studies [8]–[13] have also shown that nonparametric weighted feature extraction (NWFE) [14] is powerful in reduc- ing dimensionality of hyperspectral image data. In this paper, we try to combine the advantages of kernel method and NWFE and develop a kernel-based NWFE (KNWFE) for hyperspectral image classification. This paper is organized as follows. Introduction to the kernel trick is discussed in Section II. The reviews of some unsu- Manuscript received April 22, 2008; revised July 3, 2008 and September 4, 2008. Current version published March 27, 2009. B.-C. Kuo is with the Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung 40306, Taiwan (e-mail: kbc@mail.ntcu.edu.tw). C.-H. Li is with the Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung 40306, Taiwan, and also with the Department of Electrical and Control Engineering, National Chiao Tung University, Hsinchu 300, Taiwan. J.-M. Yang is with the Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi 621, Taiwan. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2008.2008308 pervised and supervised feature extractions and their kernel versions are introduced in Section III. KNWFE is proposed in Section IV. In order to reduce the influence of the singularity of the kernel matrix, the eigenvalue resolution is discussed in Section V. For evaluating the performance of the proposed method on a real hyperspectral image data, experiment is de- signed in Section VI and experimental results are also reported in this section. Section VII contains comments and conclusions. II. KERNEL TRICK It is easier for classification if pixels are more sparsely dis- tributed. Generally speaking, images with high dimensionality (the number of spectral bands) potentially have better class separability. The strategy of kernel method is to embed the data from original space R d into a feature space H, a Hilbert space with higher dimensionality, where more effective hyperplanes for classification are expected to exist in this space than in the original space. From this, we can compute the inner product of samples in the feature space directly from the original data items using a kernel function. This is based on the fact that any kernel function κ : R d × R d R satisfies the Mercer’s theorem [4], i.e., there is a feature map φ into a Hilbert space H such that k(x, z)= φ(x)(z), where x, z X, if and only if it is a symmetric function for which the matrices K = [κ(x i ,x j )] 1i,jN formed by restriction to any finite subset {x 1 ,...,x N } of the space X are positive semidefinite. Suppose x (i) 1 ,...,x (i) N i R d are the samples in class i, i =1,...,L, and N = N 1 + ··· + N L . Let X T i = [φ(x (i) 1 ),...,φ(x (i) N i )] and X T =[X T 1 ,...,X T L ], then the kernel matrix K =[κ(x i ,x j )] 1i,jN with respect to κ on samples is XX T , i.e., K = XX T . The following are some popular kernels. 1) Linear kernel: κ(x, z)= x, z. 2) Polynomial kernel: κ(x, z)=(x, z+ 1) r , r Z + . 3) Gaussian radial-basis-function (RBF) kernel: κ(x, z) = exp −‖x z 2 2σ 2 , σ R −{0} where x and z are the samples in R d . It is worth stressing here that the size of the kernel matrix is N × N and contains in each position K ij , the information of distance among all possible pixel pairs (x i and x j ) measured 0196-2892/$25.00 © 2009 IEEE