IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 47, NO. 4, APRIL 2009 1139
Kernel Nonparametric Weighted Feature Extraction
for Hyperspectral Image Classification
Bor-Chen Kuo, Cheng-Hsuan Li, and Jinn-Min Yang
Abstract—In recent years, many studies show that kernel meth-
ods are computationally efficient, robust, and stable for pattern
analysis. Many kernel-based classifiers were designed and applied
to classify remote-sensed data, and some results show that kernel-
based classifiers have satisfying performances. Many studies about
hyperspectral image classification also show that nonparametric
weighted feature extraction (NWFE) is a powerful tool for extract-
ing hyperspectral image features. However, NWFE is still based on
linear transformation. In this paper, the kernel method is applied
to extend NWFE to kernel-based NWFE (KNWFE). The new
KNWFE possesses the advantages of both linear and nonlinear
transformation, and the experimental results show that KNWFE
outperforms NWFE, decision-boundary feature extraction, inde-
pendent component analysis, kernel-based principal component
analysis, and generalized discriminant analysis.
Index Terms—Feature extraction, image classification.
I. I NTRODUCTION
I
N RECENT years, many studies [1]–[7] show that kernel
methods are computationally efficient, robust, and stable for
pattern analysis. Many kernel-based classifiers were designed
and applied to classify remote-sensed data, and some results
show that kernel-based classifiers have satisfying performances
[5]–[7]. The main idea of kernel methods is to map the input
data from the original space to a convenient feature space by a
nonlinear mapping where inner products in the feature space
can be computed by a kernel function without knowing the
nonlinear mapping explicitly, and linear relations are sought
among the images of the data items in the feature space.
Some studies [8]–[13] have also shown that nonparametric
weighted feature extraction (NWFE) [14] is powerful in reduc-
ing dimensionality of hyperspectral image data. In this paper,
we try to combine the advantages of kernel method and NWFE
and develop a kernel-based NWFE (KNWFE) for hyperspectral
image classification.
This paper is organized as follows. Introduction to the kernel
trick is discussed in Section II. The reviews of some unsu-
Manuscript received April 22, 2008; revised July 3, 2008 and September 4,
2008. Current version published March 27, 2009.
B.-C. Kuo is with the Graduate Institute of Educational Measurement and
Statistics, National Taichung University, Taichung 40306, Taiwan (e-mail:
kbc@mail.ntcu.edu.tw).
C.-H. Li is with the Graduate Institute of Educational Measurement and
Statistics, National Taichung University, Taichung 40306, Taiwan, and also
with the Department of Electrical and Control Engineering, National Chiao
Tung University, Hsinchu 300, Taiwan.
J.-M. Yang is with the Department of Computer Science and Information
Engineering, National Chung Cheng University, Chiayi 621, Taiwan.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TGRS.2008.2008308
pervised and supervised feature extractions and their kernel
versions are introduced in Section III. KNWFE is proposed in
Section IV. In order to reduce the influence of the singularity
of the kernel matrix, the eigenvalue resolution is discussed in
Section V. For evaluating the performance of the proposed
method on a real hyperspectral image data, experiment is de-
signed in Section VI and experimental results are also reported
in this section. Section VII contains comments and conclusions.
II. KERNEL TRICK
It is easier for classification if pixels are more sparsely dis-
tributed. Generally speaking, images with high dimensionality
(the number of spectral bands) potentially have better class
separability. The strategy of kernel method is to embed the data
from original space R
d
into a feature space H, a Hilbert space
with higher dimensionality, where more effective hyperplanes
for classification are expected to exist in this space than in the
original space. From this, we can compute the inner product
of samples in the feature space directly from the original data
items using a kernel function. This is based on the fact that
any kernel function κ : R
d
× R
d
→ R satisfies the Mercer’s
theorem [4], i.e., there is a feature map φ into a Hilbert space
H such that k(x, z)= 〈φ(x),φ(z)〉, where x, z ∈ X, if and
only if it is a symmetric function for which the matrices K =
[κ(x
i
,x
j
)]
1≤i,j≤N
formed by restriction to any finite subset
{x
1
,...,x
N
} of the space X are positive semidefinite.
Suppose x
(i)
1
,...,x
(i)
N
i
∈ R
d
are the samples in class
i, i =1,...,L, and N = N
1
+ ··· + N
L
. Let X
T
i
=
[φ(x
(i)
1
),...,φ(x
(i)
N
i
)] and X
T
=[X
T
1
,...,X
T
L
], then the
kernel matrix K =[κ(x
i
,x
j
)]
1≤i,j≤N
with respect to κ on
samples is XX
T
, i.e., K = XX
T
.
The following are some popular kernels.
1) Linear kernel:
κ(x, z)= 〈x, z〉.
2) Polynomial kernel:
κ(x, z)=(〈x, z〉 + 1)
r
, r ∈ Z
+
.
3) Gaussian radial-basis-function (RBF) kernel:
κ(x, z) = exp
−‖x − z‖
2
2σ
2
, σ ∈ R −{0}
where x and z are the samples in R
d
.
It is worth stressing here that the size of the kernel matrix is
N × N and contains in each position K
ij
, the information of
distance among all possible pixel pairs (x
i
and x
j
) measured
0196-2892/$25.00 © 2009 IEEE