Contents lists available at ScienceDirect
Remote Sensing of Environment
journal homepage: www.elsevier.com/locate/rse
Crop type mapping without field-level labels: Random forest transfer and
unsupervised clustering techniques
Sherrie Wang
a,b,*
, George Azzari
a
, David B. Lobell
a
a
Department of Earth System Science, Center on Food Security and the Environment, Stanford University, United States of America
b
Institute for Computational and Mathematical Engineering, Stanford University, United States of America
ARTICLE INFO
Keywords:
Classification
Unsupervised learning
Agriculture
Landsat
Land cover
Machine learning
Google Earth Engine
Big data
Remote sensing
ABSTRACT
Crop type mapping at the field level is necessary for a variety of applications in agricultural monitoring and food
security. As remote sensing imagery continues to increase in spatial and temporal resolution, it is becoming an
increasingly powerful raw input from which to create crop type maps. Still, automated crop type mapping
remains constrained by a lack of field-level crop labels for training supervised classification models. In this study,
we explore the use of random forests transferred across geographic distance and time and unsupervised methods
in conjunction with aggregate crop statistics for crop type mapping in the US Midwest, where we simulated the
label-poor setting by depriving the models of labels in various states and years. We validated our methodology
using available 30 m spatial resolution crop type labels from the US Department of Agriculture's Cropland Data
Layer (CDL). Using Google Earth Engine, we computed Fourier transforms (or harmonic regressions) on the time
series of Landsat Surface Reflectance and derived vegetation indices, and extracted the coefficients as features
for machine learning models. We found that random forests trained on regions and years similar in growing
degree days (GDD) transfer to the target region with accuracies consistently exceeding 80%. Accuracies decrease
as differences in GDD expand. Unsupervised Gaussian mixture models (GMM) with class labels derived using
county-level crop statistics classify crops less consistently but require no field-level labels for training. GMM
achieves over 85% accuracy in states with low crop diversity (Illinois, Iowa, Indiana, Nebraska), but performs
sometimes no better than random when high crop diversity interferes with clustering (North Dakota, South
Dakota, Wisconsin, Michigan). Under the appropriate conditions, these methods offer options for field-resolution
crop type mapping in regions around the world with few or no ground labels.
1. Introduction
Growing demand for food from an increasing global population
necessitates close monitoring of agricultural activities, especially in
regions of the world where food security remains elusive. Major steps
along the road to closing the yield gap and achieving sustainable food
security include accurately forecasting crop yields, understanding farm
management practices, establishing links between crop choices and
nutritional outcomes, evaluating the impact of changing policies and
aid, and predicting with more certainty how climate change will affect
agriculture. Helpful to all of these higher objectives is first knowing
which crop types are growing in farmers' fields, not only at an ag-
gregated regional level but at the level of the individual plot.
Traditionally, crop type information has been obtained from field
surveys and censuses. This is true worldwide: in the United States, the
Department of Agriculture's (USDA) National Agricultural Statistics
Service (NASS) and Farm Service Agency (FSA) both collect field-level
crop information (among much other data) via personal interviews with
producers, and use it to monitor and forecast production throughout the
growing season (Common Land Unit (CLU) Information Sheet, n.d.;
June Area, n.d.). In Sub-Saharan Africa, crop type information included
in the World Bank's Living Standards Measurement Study-Integrated
Surveys on Agriculture (LSMS-ISA) Initiative has allowed for the study
of how input use and soil quality affect yield and household income
(Liverpool-Tasie et al., 2017; Bhargava et al., 2018). Such surveys,
however, have their limitations: they are expensive and time-con-
suming to conduct, are therefore updated infrequently and cover small
spatial extents in much of the world, and have been shown to contain
biases due to flawed human recall (Carletto et al., 2015; Gourlay et al.,
2017).
With the advent of accessible satellite data, many researchers see an
opportunity to augment surveys or lessen the burden of data collection
https://doi.org/10.1016/j.rse.2018.12.026
Received 9 May 2018; Received in revised form 23 November 2018; Accepted 17 December 2018
*
Corresponding author at: Department of Earth System Science, Center on Food Security and the Environment, Stanford University, United States of America.
E-mail address: sherwang@stanford.edu (S. Wang).
Remote Sensing of Environment 222 (2019) 303–317
0034-4257/ © 2019 Elsevier Inc. All rights reserved.
T