ISPRS Journal of Photogrammetry and Remote Sensing 171 (2021) 188–201
0924-2716/© 2020 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
Fully convolutional recurrent networks for multidate crop recognition from
multitemporal image sequences
Jorge Andres Chamorro Martinez
a, *
, Laura Elena Cu´e La Rosa
a
, Raul Queiroz Feitosa
a
,
Ieda Del’Arco Sanches
b
, Patrick Nigri Happ
a
a
Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro 22451900, Brazil
b
National Institute for Space Research, S˜ ao Jos´e dos Campos 12227010, Brazil
A R T I C L E INFO
Keywords:
Convolutional recurrent networks
Fully convolutional networks
Recurrent networks
Crop recognition
Deep learning
Remote sensing
ABSTRACT
Crop recognition in tropical regions is a challenging task because of the highly complex crop dynamics, with
multiple crops per year. Nevertheless, most automatic methods proposed thus far are devoted to temperate areas
where normally a single crop is cultivated along the crop year. This paper introduces convolutional recurrent
networks for crop recognition in areas characterized by complex spatiotemporal dynamics typical of tropical
agriculture, where a per date classification is required. The proposed networks consist of two sequential steps.
First, a deep network simultaneously models spatial and temporal contexts. Second, a post-processing algorithm
enforces prior knowledge about the crop dynamics in the target area based on the posterior probabilities
computed in the first step. The paper proposes deep network architectures that join a fully convolutional network
(FCN) for modeling spatial context at multiple levels and a bidirectional recurrent neural network to explore the
temporal context. The recurrent network is configured as N-to-N, where N is the sequence length. This allows it to
produce classification outcomes for the entire sequence of multi-temporal images using a single network.
Different network designs are proposed based on three FCN architectures: U-Net, dense network, and Atrous
Spatial Pyramid Pooling. A convolutional Long-Short-Term-Memory (ConvLSTM) accounts for sequence modeling,
whereas the Most Likely Class Sequence (MLCS) algorithm is adopted for enforcing prior knowledge. The paper
finally reports experiments conducted on Sentinel-1 data of two publicly available datasets from different
tropical regions. The experimental results indicated that the proposed architectures outperformed state-of-the-art
methods based on recurrent networks in terms of Overall Accuracy and per-class F1 score.
1. Introduction
By 2050, more than 1 billion hectares of wildland will be needed as
agricultural areas to meet the food requirements of the steadily growing
population (Sachs et al., 2010). In consequence, it is necessary to in-
crease global agricultural production with minimum environmental
impact. In this context, timely information about crop extent is essential
to support decision-makers in the governmental sector and the private
sector. Remote sensing data from satellite sensors provide cost-effective,
timely, and reliable information over large areas (Thenkabail, 2015).
Crop recognition from RS data is a challenging task, particularly in
tropical regions, because the favorable climate associated with the use of
modern technologies makes agriculture highly dynamic (Sanches et al.,
2018b).
In recent years, deep neural networks have made several break-
throughs in fields such as computer vision, speech recognition (LeCun
et al., 2015), and, more recently, also in diverse remote sensing appli-
cations (Ma et al., 2019b; Gu et al., 2019). Such models can be roughly
grouped into two main categories: Convolutional Neural Networks
(CNN) that explore the spatial context, and Recurrent Neural Networks
(RNN), mostly to model data sequences.
A group of CNN variants, called Fully Convolutional Networks (FCN)
(Long et al., 2015), has been widely used for semantic segmentation.
Among the many RNN variants proposed so far, the Long Short-Term
Memory (LSTM) (Hochreiter and Schmidhuber, 1997) is the most
commonly used one, mainly due to its ability to outweigh the vanishing
* Corresponding author.
E-mail addresses: jchamorro@ele.puc-rio.br (J.A. Chamorro Martinez), lauracue@ele.puc-rio.br (L.E. Cu´e La Rosa), raul@ele.puc-rio.br (R.Q. Feitosa), ieda.
sanches@inpe.br (I.D. Sanches), patrick@ele.puc-rio.br (P.N. Happ).
URL: http://www.lvc.ele.puc-rio.br/wp/ (J.A. Chamorro Martinez).
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing
journal homepage: www.elsevier.com/locate/isprsjprs
https://doi.org/10.1016/j.isprsjprs.2020.11.007
Received 5 May 2020; Received in revised form 8 October 2020; Accepted 11 November 2020