Contents lists available at ScienceDirect Remote Sensing of Environment journal homepage: www.elsevier.com/locate/rse Crop type mapping without field-level labels: Random forest transfer and unsupervised clustering techniques Sherrie Wang a,b,* , George Azzari a , David B. Lobell a a Department of Earth System Science, Center on Food Security and the Environment, Stanford University, United States of America b Institute for Computational and Mathematical Engineering, Stanford University, United States of America ARTICLE INFO Keywords: Classification Unsupervised learning Agriculture Landsat Land cover Machine learning Google Earth Engine Big data Remote sensing ABSTRACT Crop type mapping at the field level is necessary for a variety of applications in agricultural monitoring and food security. As remote sensing imagery continues to increase in spatial and temporal resolution, it is becoming an increasingly powerful raw input from which to create crop type maps. Still, automated crop type mapping remains constrained by a lack of field-level crop labels for training supervised classification models. In this study, we explore the use of random forests transferred across geographic distance and time and unsupervised methods in conjunction with aggregate crop statistics for crop type mapping in the US Midwest, where we simulated the label-poor setting by depriving the models of labels in various states and years. We validated our methodology using available 30 m spatial resolution crop type labels from the US Department of Agriculture's Cropland Data Layer (CDL). Using Google Earth Engine, we computed Fourier transforms (or harmonic regressions) on the time series of Landsat Surface Reflectance and derived vegetation indices, and extracted the coefficients as features for machine learning models. We found that random forests trained on regions and years similar in growing degree days (GDD) transfer to the target region with accuracies consistently exceeding 80%. Accuracies decrease as differences in GDD expand. Unsupervised Gaussian mixture models (GMM) with class labels derived using county-level crop statistics classify crops less consistently but require no field-level labels for training. GMM achieves over 85% accuracy in states with low crop diversity (Illinois, Iowa, Indiana, Nebraska), but performs sometimes no better than random when high crop diversity interferes with clustering (North Dakota, South Dakota, Wisconsin, Michigan). Under the appropriate conditions, these methods offer options for field-resolution crop type mapping in regions around the world with few or no ground labels. 1. Introduction Growing demand for food from an increasing global population necessitates close monitoring of agricultural activities, especially in regions of the world where food security remains elusive. Major steps along the road to closing the yield gap and achieving sustainable food security include accurately forecasting crop yields, understanding farm management practices, establishing links between crop choices and nutritional outcomes, evaluating the impact of changing policies and aid, and predicting with more certainty how climate change will affect agriculture. Helpful to all of these higher objectives is first knowing which crop types are growing in farmers' fields, not only at an ag- gregated regional level but at the level of the individual plot. Traditionally, crop type information has been obtained from field surveys and censuses. This is true worldwide: in the United States, the Department of Agriculture's (USDA) National Agricultural Statistics Service (NASS) and Farm Service Agency (FSA) both collect field-level crop information (among much other data) via personal interviews with producers, and use it to monitor and forecast production throughout the growing season (Common Land Unit (CLU) Information Sheet, n.d.; June Area, n.d.). In Sub-Saharan Africa, crop type information included in the World Bank's Living Standards Measurement Study-Integrated Surveys on Agriculture (LSMS-ISA) Initiative has allowed for the study of how input use and soil quality affect yield and household income (Liverpool-Tasie et al., 2017; Bhargava et al., 2018). Such surveys, however, have their limitations: they are expensive and time-con- suming to conduct, are therefore updated infrequently and cover small spatial extents in much of the world, and have been shown to contain biases due to flawed human recall (Carletto et al., 2015; Gourlay et al., 2017). With the advent of accessible satellite data, many researchers see an opportunity to augment surveys or lessen the burden of data collection https://doi.org/10.1016/j.rse.2018.12.026 Received 9 May 2018; Received in revised form 23 November 2018; Accepted 17 December 2018 * Corresponding author at: Department of Earth System Science, Center on Food Security and the Environment, Stanford University, United States of America. E-mail address: sherwang@stanford.edu (S. Wang). Remote Sensing of Environment 222 (2019) 303–317 0034-4257/ © 2019 Elsevier Inc. All rights reserved. T