ISPRS Journal of Photogrammetry and Remote Sensing 171 (2021) 188–201 0924-2716/© 2020 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. Fully convolutional recurrent networks for multidate crop recognition from multitemporal image sequences Jorge Andres Chamorro Martinez a, * , Laura Elena Cu´e La Rosa a , Raul Queiroz Feitosa a , Ieda DelArco Sanches b , Patrick Nigri Happ a a Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro 22451900, Brazil b National Institute for Space Research, S˜ ao Jos´e dos Campos 12227010, Brazil A R T I C L E INFO Keywords: Convolutional recurrent networks Fully convolutional networks Recurrent networks Crop recognition Deep learning Remote sensing ABSTRACT Crop recognition in tropical regions is a challenging task because of the highly complex crop dynamics, with multiple crops per year. Nevertheless, most automatic methods proposed thus far are devoted to temperate areas where normally a single crop is cultivated along the crop year. This paper introduces convolutional recurrent networks for crop recognition in areas characterized by complex spatiotemporal dynamics typical of tropical agriculture, where a per date classification is required. The proposed networks consist of two sequential steps. First, a deep network simultaneously models spatial and temporal contexts. Second, a post-processing algorithm enforces prior knowledge about the crop dynamics in the target area based on the posterior probabilities computed in the first step. The paper proposes deep network architectures that join a fully convolutional network (FCN) for modeling spatial context at multiple levels and a bidirectional recurrent neural network to explore the temporal context. The recurrent network is configured as N-to-N, where N is the sequence length. This allows it to produce classification outcomes for the entire sequence of multi-temporal images using a single network. Different network designs are proposed based on three FCN architectures: U-Net, dense network, and Atrous Spatial Pyramid Pooling. A convolutional Long-Short-Term-Memory (ConvLSTM) accounts for sequence modeling, whereas the Most Likely Class Sequence (MLCS) algorithm is adopted for enforcing prior knowledge. The paper finally reports experiments conducted on Sentinel-1 data of two publicly available datasets from different tropical regions. The experimental results indicated that the proposed architectures outperformed state-of-the-art methods based on recurrent networks in terms of Overall Accuracy and per-class F1 score. 1. Introduction By 2050, more than 1 billion hectares of wildland will be needed as agricultural areas to meet the food requirements of the steadily growing population (Sachs et al., 2010). In consequence, it is necessary to in- crease global agricultural production with minimum environmental impact. In this context, timely information about crop extent is essential to support decision-makers in the governmental sector and the private sector. Remote sensing data from satellite sensors provide cost-effective, timely, and reliable information over large areas (Thenkabail, 2015). Crop recognition from RS data is a challenging task, particularly in tropical regions, because the favorable climate associated with the use of modern technologies makes agriculture highly dynamic (Sanches et al., 2018b). In recent years, deep neural networks have made several break- throughs in fields such as computer vision, speech recognition (LeCun et al., 2015), and, more recently, also in diverse remote sensing appli- cations (Ma et al., 2019b; Gu et al., 2019). Such models can be roughly grouped into two main categories: Convolutional Neural Networks (CNN) that explore the spatial context, and Recurrent Neural Networks (RNN), mostly to model data sequences. A group of CNN variants, called Fully Convolutional Networks (FCN) (Long et al., 2015), has been widely used for semantic segmentation. Among the many RNN variants proposed so far, the Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) is the most commonly used one, mainly due to its ability to outweigh the vanishing * Corresponding author. E-mail addresses: jchamorro@ele.puc-rio.br (J.A. Chamorro Martinez), lauracue@ele.puc-rio.br (L.E. Cu´e La Rosa), raul@ele.puc-rio.br (R.Q. Feitosa), ieda. sanches@inpe.br (I.D. Sanches), patrick@ele.puc-rio.br (P.N. Happ). URL: http://www.lvc.ele.puc-rio.br/wp/ (J.A. Chamorro Martinez). Contents lists available at ScienceDirect ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs https://doi.org/10.1016/j.isprsjprs.2020.11.007 Received 5 May 2020; Received in revised form 8 October 2020; Accepted 11 November 2020