IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022 5509612 Hyperspectral Image Classiﬁcation Using Attention-Based Bidirectional Long Short-Term Memory Network Shaohui Mei , Senior Member, IEEE, Xingang Li, Xiao Liu, Huimin Cai, and Qian Du , Fellow, IEEE Abstract—Deep neural networks have been widely applied to hyperspectral image (HSI) classiﬁcation areas, in which recurrent neural network (RNN) is one of the most typical networks. Most of the existing RNN-based classiﬁers treat the spectral signature of pixels as an ordered sequence, in which only unidirectional correlation along the wavelength direction of adjacent bands is considered. However, each band image is related to not only its preceding band images but also its successive band images. In order to fully explore such bidirectional spectral correlation within an HSI, in this article, a bidirectional long short-term memory (Bi-LSTM)-based network is designed for HSI clas- siﬁcation. Moreover, a spatial–spectral attention mechanism is designed and implemented in the proposed Bi-LSTM network to emphasize the effective information and reduce the redundant information among spatial–spectral context of pixels, by which the performance of classiﬁcation can be greatly improved. Experimental results over three benchmark HSIs, i.e., Salinas Valley, Pavia Centre, and Pavia University, demonstrate that our proposed Bi-LSTM obviously outperforms several state-of-the-art unidirectional RNN-based classiﬁcation algorithms. Moreover, the proposed spatial–spectral attention mechanism can further improve the classiﬁcation accuracy of our proposed Bi-LSTM algorithm by effectively weighting spatial and spectral context of pixels. The source code of the proposed Bi-LSTM algorithm is available at https://github.com/MeiShaohui/Attention-based- Bidirectional-LSTM-Network. Index Terms— Attention network, classiﬁcation, deep learning, hyperspectral image (HSI), recurrent neural network (RNN). I. I NTRODUCTION H YPERSPECTRAL imaging sensors can obtain abundant spectral information of objects while preserving their spatial information, which enables to explore spectral and spatial characteristics. Compared with convolutional color images or multispectral remote sensing images, hyperspectral images (HSIs) have greatly improved in information richness, Manuscript received February 7, 2021; revised June 24, 2021 and July 16, 2021; accepted July 27, 2021. Date of publication August 11, 2021; date of current version January 17, 2022. This work was supported in part by the Fundamental Research Funds for the Central Universities and in part by the National Natural Science Foundation of China under Grant 61671383. (Corresponding author: Shaohui Mei.) Shaohui Mei, Xingang Li, and Xiao Liu are with the School of Electron- ics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi 710129, China (e-mail: meish@nwpu.edu.cn). Huimin Cai is with Tianjin Jinhang Institute of Technical Physics, Tianjin 300192, China. Qian Du is with the Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS 39762 USA. Digital Object Identiﬁer 10.1109/TGRS.2021.3102034 leading to great interest in many applicational ﬁelds, such as medicine, agriculture, industry, and food [1]–[4]. HSI classiﬁcation, which assigns labels to different pixels by exploring their spectral signature and spatial context, has attracted great attention in past decades. A simple way for such a purpose is directly feeding spectral pixel vectors into conven- tional classiﬁers [5]. For example, Melgani and Bruzzone [6] and Camps-Valls et al. [7] addressed the problem of the clas- siﬁcation of HSIs by support vector machines (SVMs). Ham et al. [8] and Belgiu and Dr˘ agu¸ t [9] proposed to classify HSIs using a random forest (RF) classiﬁer. However, due to high spectral dimensionality, directly using the spectral information of HSIs can easily lead to the curve of dimensionality (i.e., the Hughes effect) [10], [11]. Therefore, many methods were proposed to explore discriminative features implied in the high-dimensional spectral signatures [12], [13], among which the representative algorithms are principal component analysis (PCA) [14]–[16], linear discriminant analysis (LDA) [17], manifold learning-based methods [18], [19], and graph embed- ding [20], [21]. In recent years, many attempts for HSI classiﬁcation have been made with deep learning [22], [23], in which convolutional neural network (CNN) has achieved many suc- cesses [24], [25]. Generally, CNN can conduct feature extrac- tion over HSIs in different dimensions for classiﬁcation tasks. For example, 1-dimensional CNN (1DCNN) directly handles 1-D spectral vectors to the network for classiﬁcation [26], [27], by which existing relationships between the spectral signatures associated with each HSI pixel and the information contained in them are learned [28], [29]. In order to learn spatial features representations from the data, 2-D-CNN is used to handle dimensionality-reduced hyperspectral data by PCA for classiﬁcation [30]–[32]. In order to well explore the high-dimensional data structure of HSIs, 3-D convolution is directly used in many CNNs to explore the spatial–spectral property of HSIs for classiﬁcation [33], [34], such as multi- scale 3D deep CNN (M3D-CNN) [35] and HSI-CNN [36]. The lightweight version of CNN-based HSI classiﬁcation has also been explored in [37] and [38]. The autoencoders (AEs) have also been used as deep models to perform unsuper- vised coding from HSI data. For example, an unsupervised tied AE (TAE) was proposed for spectral feature extrac- tion [39]. Spectral–spatial feature extraction has also been implemented using AE-based networks, such as stacked AE 1558-0644 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: Mississippi State University Libraries. Downloaded on January 19,2022 at 00:36:11 UTC from IEEE Xplore. Restrictions apply.