SPARSE REPRESENTATIONS FOR HYPERSPECTRAL DATA CLASSIFICATION
Salman Siddiqui, Stefan Robila, Jing Peng, Dajin Wang
Montclair State University
{siddiquis1 , robilas , pengj , wangd } @mail.montclair.edu
ABSTRACT
We investigate the use of sparse principal components for
representing hyperspectral imagery when performing feature
selection. For conventional multispectral data with low
dimensionality, dimension reduction can be achieved by
using traditional feature selection techniques for producing a
subset of features that provide the highest class separability,
or by feature extraction techniques via linear transformation.
When dealing with hyperspectral data, feature selection is a
time consuming task, often requiring exhaustive search of
all the feature subset combinations. Instead, feature
extraction technique such as PCA is commonly used.
Unfortunately, PCA usually involves non-zero linear
combinations or `loadings` of all of the data. Sparse
principal components are the sets of sparse vectors spanning
a low-dimensional space that explain most of the variance
present in the data. Our experiments show that sparse
principal components having low-dimensionality still
characterize the variance in the data. Sparse data
representations are generally desirable for hyperspectral
images because sparse representations help in human
understanding and in classification.
Index Terms— Hyperspectral data, PCA, SPCA, DSPCA,
Sparse representation
1. INTRODUCTION
Collection and processing of data of all kinds are on scales
unimaginable until recently due to exceptional processing
power available. Recent advances in high throughput data
acquisition, digital storage, computer processing and
communication technologies have made it possible to
gather, store, and transmit large volumes of data.
Hyperspectral imaging is a powerful tool for many real
world applications such as agriculture, mining, defense and
environmental monitoring. However, hyperspectral imagery
tends to be more difficult to process due to high
dimensionality [16]. To address this problem, feature
extraction techniques such as Principal Component Analysis
(PCA) are most often applied. However, in PCA each
resulting principal component (PC) is a linear combination
of all the original hyperspectral bands. This makes the
derived PCs difficult to interpret and the transformed
hyperspectral data expensive to classify. To mitigate the
problem, rotation techniques and segmented PCA are
commonly used. Each has their own drawbacks. Informal
thresholding approach used in rotational techniques can be
potentially misleading [5], while segmented PCA may not
be the most efficient way to segment spectral bands if the
goal is to detect a specific target type [6].
There are obvious problems caused by the rapid increase in
volume associated with adding extra dimensions to a
mathematical space, which is often referred as “Curse of
Dimensionality”. The high dimensional data is always
difficult to work with for several reasons. Adding more
feature means more noise and hence more error. The
complexity grows exponentially with the number of
dimensions, rapidly outstripping the computational and
memory storage capabilities of computers. The curse of
dimensionality complicates machine learning problems that
involve learning from a finite number of data samples in a
high-dimensional feature space.
In most of the cases reducing the number of dimensions
improves efficiency. Also, the cost associated with
measurement, storage, computation decreases with
reduction in dimension. It improves classification
performance, helps in interpretation/modeling and also
enhances generalization capability. It speed learning
process by many folds. Most often, dimension reduction is
applied to the high dimensional data. Feature extraction is to
apply a map of the multidimensional space into a space of
fewer dimensions. This means that the original feature space
is transformed by applying e.g. a linear transformation via a
principal components analysis. Feature extraction is a
method of constructing combinations of the features to get
around these problems while still describing the data with
sufficient accuracy.
Conventional sparse PCA typically applies simple
transforms such as axis rotations and component
thresholding [7]. In Zou, Hastie and Tibshirani’s [10]
approach called sparse PCA (SPCA), it finds modified
components with zero loading in very large problems, by
writing PCA as a regression-type optimization problem.
This allows the application of Lasso [14], a penalization
technique based on the "
1
norm. All these methods are
II - 577 978-1-4244-2808-3/08/$25.00 ©2008 IEEE IGARSS 2008