2696 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015 Spectral–Spatial Classiﬁcation of Hyperspectral Images via Spatial Translation-Invariant Wavelet-Based Sparse Representation Lin He, Member, IEEE, Yuanqing Li, Senior Member, IEEE, Xiaoxin Li, and Wei Wu, Member, IEEE Abstract—For hyperspectral image (HSI) classiﬁcation, it is challenging to adopt the methodology of sparse-representa- tion-based classiﬁcation. In this paper, we ﬁrst propose an ℓ 1 -minimization-based spectral–spatial classiﬁcation method for HSIs via a spatial translation-invariant wavelet (STIW)-based sparse representation (STIW-SR), wherein both the spectrum dic- tionary and the analyzed signal are formed with STIW features. Due to the capability of a STIW to reduce both the observa- tion noise and the spatial nonstationarity while maintaining the ideal spectra, which is proved with our signal–interference–noise spectrum model involved, it is expected that the pixels in the same class congregate in a lower dimensional subspace, and the separations among class-speciﬁc subspaces are enhanced, thus yielding a highly discriminative sparse representation. Then, we develop an approach to evaluate the sparsity recoverability of an ℓ 1 -minimization on HSIs in a probabilistic framework. This approach takes into account not only the recovery probability under the given support length of the ℓ 0 -norm solution but also the a priori probability of the support length; consequently, it over- comes the inability of traditional mutual/cumulative coherence conditions to address high-coherence HSIs. This paper reveals that the higher sparsity recoverability of a STIW-SR leads to its higher classiﬁcation accuracy and that the increasing coher- ence does not necessarily lead to a reduced sparsity recovery probability, and this paper veriﬁes the connection between ℓ 0 - and ℓ 1 -minimizations on HSIs. Experimental results from real- world HSIs suggest that our classiﬁcation method signiﬁcantly outperforms several representative spectral–spatial classiﬁers and support vector machines. Index Terms—Hyperspectral image (HSI), sparse representa- tion, sparsity recoverability, spatial translation-invariant wavelet (STIW), spectral–spatial classiﬁcation. Manuscript received June 16, 2013; revised December 21, 2013, April 26, 2014, and August 9, 2014; accepted October 4, 2014. This work was supported by the National High-Tech Research and Development Program of China (863 Program) under Grant 2012AA011601, by the National Natural Science Foundation of China under Grant 91120305 and Grant 61403144, and by the High-Level Talent Project of Guangdong Province of China. L. He, Y. Li, and W. Wu are with the School of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, China (e-mail: helin@scut.edu.cn; auyqli@scut.edu.cn). X. Li is with the Center for Computer Vision, School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou 510275, China, and also with the College of Computer Science and Technology, Faculty of Information Technology, Zhejiang University of Technology, Hangzhou 310023, China. Color versions of one or more of the ﬁgures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/TGRS.2014.2363682 I. I NTRODUCTION H YPERSPECTRAL images (HSIs) contain not only spa- tial information but also rich spectral information [1], [2]. The methods used to classify such data can be approximately divided into the pixelwise and spectral–spatial categories [3], [4]. The former assigns a pixel to a class exclusively based on its spectrum, notably a support vector machine (SVM) due to its good performance [5], [6], whereas the latter utilizes both the pixel’s spectrum and its dependence on the spatial neighbors [4], [7]–[12]. A sparse representation originates from the observation that most natural signals can be effectively represented with a few coefﬁcients on a basis set [13]–[18]. It has been successfully applied in image enhancement [19], signal source separation [20], [21], feature selection [22], compressive sensing [16], and biometric classiﬁcation [23], [24]. Speciﬁcally, sparse- representation-based classiﬁcation (SRC) has been introduced to the HSI processing community, wherein a spectrum- dictionary-based sparse representation (SDSR) is used [11], [12], [25]. In an SDSR, training pixels are used as the atoms of a dictionary Ψ tr , and if the involved ℓ 0 -minimization (i.e., min α ‖α‖ 0 , s.t. Ψ tr α = x te ) is relaxed to an ℓ 1 -minimization, it can be formulated as min α ‖α‖ 1 , s.t. Ψ tr α = x te (1) where x te is the test pixel, and α are the sparse coefﬁcients. No universally best method exists for all scenarios [26], [27]. When the SRC methodology is extended to the HSI classiﬁcation, the special characteristics of HSIs have to be considered. HSIs are featured by the high observation noise and the spatial nonstationarity, whereas the SRC assumes that class-speciﬁc samples lie in low-dimensional subspaces and that a test sample is speciﬁed as the sparse linear combination of the training samples [24]. Hence, for an HSI, class-speciﬁc subspaces heavily interfere with one another. If we use the SRC directly, the yielded nonzero coefﬁcients will spread across all of the classes, implying weak discriminability. The observation noise and the spatial nonstationarity have a drastic impact on the SRC. If they can be reduced, the SRC is expected to gain enhanced discriminability. Moreover, to classify a sample, the SRC utilizes the spars- est coefﬁcients that are in theory solved by an NP-hard ℓ 0 -minimization and are considered the objective for discrimi- nating [11], [12], [24], [25]. If a convex ℓ 1 -minimization is used 0196-2892 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.