This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 1 Remote Sensing Scene Classiﬁcation Using Convolutional Features and Deep Forest Classiﬁer Yaakoub Boualleg , Student Member, IEEE, Mohamed Farah , and Imed Riadh Farah Abstract—High-resolution remote sensing scene classiﬁcation (HR-RSSC) plays an increasingly important role since it aims to enhance the scene semantic understanding. Recently, con- volutional neural networks (CNNs) proved their effectiveness in learning powerful feature representations for various visual recognition tasks. However, in the RS domain, the performance of CNN is still limited due to the lack of sufﬁcient labeled data. In this letter, we propose an HR-RSSC method based on CNN transfer learning (TL) for feature extraction (FE) and deep forest (DF) for classiﬁcation. In fact, we extract deep features from the last convolutional layer in order to avoid the use of the fully connected layers (FCLs) which need many parameters to tune. Moreover, we train a DF model that is based on ensemble learning that can achieve better performances than single classiﬁers and is easy to train with few parameters. We evaluate the proposed method on two RS image data sets. Compared to full-training, ﬁne-tuning, and state-of-the-art CNN TL methods, the results demonstrate the effectiveness of the DF model for HR-RSSC based on CNN TL in terms of overall accuracy and training time. Index Terms— Convolutional neural network (CNN), deep forest (DF), remote sensing (RS), scene classiﬁcation, transfer learning (TL). I. I NTRODUCTION I N RECENT years, large volumes of high-resolution remote sensing (RS) images have become publicly available. In order to mine high-quality information from these available large-scale RS images, the research community has shown a growing interest in RS image analysis. Signiﬁcant efforts have been made to develop accurate high-resolution RS scene classiﬁcation (HR-RSSC) methods that intend to increase the semantic understanding of the RS images by labeling each image with a speciﬁc semantic scene category. Most recent methods rely on deep learning (DL) and convolutional neural networks (CNNs) are the dominant DL architecture, which are being used for most computer vision tasks. However, training CNNs from scratch requires a huge amount of labeled data. In addition, parameter’s tuning is a hard process and is uninterpretable for the theoretical analysis. Also, although CNNs convolutional layers (Conv) are powerful to extract high-order features, the ordinary CNN architectures use the fully connected layers (FCLs) as a classiﬁer. However, it is widely known that FCLs can easily overﬁt with a small size of the training data. Manuscript received March 12, 2019; accepted April 14, 2019. (Correspond- ing author: Yaakoub Boualleg.) The authors are with the SIIVT–RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba 2010, Tunisia (e-mail: yaakoub.boualleg@ieee.org). Color versions of one or more of the ﬁgures in this letter are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/LGRS.2019.2911855 Several studies attempt to alleviate the overﬁtting prob- lem by using dropout and regularization methods [1], [2] or replace the FCLs by a global average pooling (GAP) layer [3]. Also, an effective solution to exploit the CNNs performance for small-scale data sets is CNN transfer learning (TL) by ﬁne-tuning the pretrained model or using the CNN model as a feature extractor. In addition, recent studies have tended to replace the common neurons with decision trees [4] as an alternative solution to alleviate the mentioned deﬁcien- cies of deep neural networks (DNNs). The deep forest (DF) is a recent DL architecture that is based on an ensemble-learning method where multiple learners are trained and combined for a single task. It was ﬁrst proposed by Zhou and Feng [4]. DF architectures have few parameters compared to CNNs and are, therefore, easy to train with low computational costs. Furthermore, the DF models can be applied to small data sets and have achieved a competitive performance to CNNs for different classiﬁcation tasks. In order to fully exploit the advantages of the CNN in extracting high-order image features and the DF as an ensemble-learning method which can achieve better perfor- mance than a single classiﬁer, and motivated by the promising results of the CNN TL methods, we propose an HR-RSSC method based on the TL of a pretrained CNN model using a DF classiﬁer. The pretrained CNN model is used as a feature extractor by extracting the image convolutional features. Then, the extracted feature maps (Fmaps) are fed into the pro- posed DF model to predict the corresponding scene category. Unlike standard classiﬁers, the DF is able to handle different spatial feature relationships among the feature maps in the feature representation stage using the multigrained scanning (MGS). Next, in the second stage of tuning the DF model, the cascade forest structure (CFS) allows the layer-by-layer feature processing, where the information is fed forward over the model layers until the ﬁnal layer to get the ﬁnal class prediction. II. RELATED WORKS The RS domain still suffers from the lack of sufﬁcient labeled samples due to the high cost of the labeling process. Exploiting CNNs for small-scale labeled RS images has been widely investigated. Nogueira et al. [5] have investigated the three strategies of exploiting CNNs for HR-RSSC: full- training, ﬁne-tuning, and using CNNs as feature extractors. Nogueira et al. [5] conclude that using the pretrained CNN as feature extractors is the best strategy. Hu et al. [6] inves- tigated the strength of the extracted features (Fmaps) from the VGG16 pretrained CNN within two scenarios. In the ﬁrst scenario, the Fmaps are extracted from the last FCL. In the second scenario, the Fmaps are extracted from the last 1545-598X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.