SPARSE REPRESENTATIONS FOR HYPERSPECTRAL DATA CLASSIFICATION Salman Siddiqui, Stefan Robila, Jing Peng, Dajin Wang Montclair State University {siddiquis1 , robilas , pengj , wangd } @mail.montclair.edu ABSTRACT We investigate the use of sparse principal components for representing hyperspectral imagery when performing feature selection. For conventional multispectral data with low dimensionality, dimension reduction can be achieved by using traditional feature selection techniques for producing a subset of features that provide the highest class separability, or by feature extraction techniques via linear transformation. When dealing with hyperspectral data, feature selection is a time consuming task, often requiring exhaustive search of all the feature subset combinations. Instead, feature extraction technique such as PCA is commonly used. Unfortunately, PCA usually involves non-zero linear combinations or `loadings` of all of the data. Sparse principal components are the sets of sparse vectors spanning a low-dimensional space that explain most of the variance present in the data. Our experiments show that sparse principal components having low-dimensionality still characterize the variance in the data. Sparse data representations are generally desirable for hyperspectral images because sparse representations help in human understanding and in classification. Index Terms— Hyperspectral data, PCA, SPCA, DSPCA, Sparse representation 1. INTRODUCTION Collection and processing of data of all kinds are on scales unimaginable until recently due to exceptional processing power available. Recent advances in high throughput data acquisition, digital storage, computer processing and communication technologies have made it possible to gather, store, and transmit large volumes of data. Hyperspectral imaging is a powerful tool for many real world applications such as agriculture, mining, defense and environmental monitoring. However, hyperspectral imagery tends to be more difficult to process due to high dimensionality [16]. To address this problem, feature extraction techniques such as Principal Component Analysis (PCA) are most often applied. However, in PCA each resulting principal component (PC) is a linear combination of all the original hyperspectral bands. This makes the derived PCs difficult to interpret and the transformed hyperspectral data expensive to classify. To mitigate the problem, rotation techniques and segmented PCA are commonly used. Each has their own drawbacks. Informal thresholding approach used in rotational techniques can be potentially misleading [5], while segmented PCA may not be the most efficient way to segment spectral bands if the goal is to detect a specific target type [6]. There are obvious problems caused by the rapid increase in volume associated with adding extra dimensions to a mathematical space, which is often referred as “Curse of Dimensionality”. The high dimensional data is always difficult to work with for several reasons. Adding more feature means more noise and hence more error. The complexity grows exponentially with the number of dimensions, rapidly outstripping the computational and memory storage capabilities of computers. The curse of dimensionality complicates machine learning problems that involve learning from a finite number of data samples in a high-dimensional feature space. In most of the cases reducing the number of dimensions improves efficiency. Also, the cost associated with measurement, storage, computation decreases with reduction in dimension. It improves classification performance, helps in interpretation/modeling and also enhances generalization capability. It speed learning process by many folds. Most often, dimension reduction is applied to the high dimensional data. Feature extraction is to apply a map of the multidimensional space into a space of fewer dimensions. This means that the original feature space is transformed by applying e.g. a linear transformation via a principal components analysis. Feature extraction is a method of constructing combinations of the features to get around these problems while still describing the data with sufficient accuracy. Conventional sparse PCA typically applies simple transforms such as axis rotations and component thresholding [7]. In Zou, Hastie and Tibshirani’s [10] approach called sparse PCA (SPCA), it finds modified components with zero loading in very large problems, by writing PCA as a regression-type optimization problem. This allows the application of Lasso [14], a penalization technique based on the " 1 norm. All these methods are II - 577 978-1-4244-2808-3/08/$25.00 ©2008 IEEE IGARSS 2008