Vol.:(0123456789)
Precision Agriculture
https://doi.org/10.1007/s11119-018-09628-4
1 3
An approach to forecast grain crop yield using multi‑layered,
multi‑farm data sets and machine learning
Patrick Filippi, et al. [full author details at the end of the article]
© Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract
Many broadacre farmers have a time series of crop yield monitor data for their felds which
are often augmented with additional data, such as soil apparent electrical conductivity sur-
veys and soil test results. In addition there are now readily available national and global
datasets, such as rainfall and MODIS, which can be used to represent the crop-growing
environment. Rather than analysing one feld at a time as is typical in precision agricul-
ture research, there is an opportunity to explore the value of combining data over multiple
felds/farms and years into one dataset. Using these datasets in conjunction with machine
learning approaches allows predictive models of crop yield to be built. In this study, sev-
eral large farms in Western Australia were used as a case study, and yield monitor data
from wheat, barley and canola crops from three diferent seasons (2013, 2014 and 2015)
that covered ~ 11 000 to ~ 17 000 hectares in each year were used. The yield data were
processed to a 10 m grid, and for each observation point associated predictor variables in
space and time were collated. The data were then aggregated to a 100 m spatial resolution
for modelling yield. Random forest models were used to predict crop yield of wheat, barley
and canola using this dataset. Three separate models were created based on pre-sowing,
mid-season and late-season conditions to explore the changes in the predictive ability of
the model as more within-season information became available. These time points also
coincide with points in the season when a management decision is made, such as the appli-
cation of fertiliser. The models were evaluated with cross-validation using both felds and
years for data splitting, and this was assessed at the feld spatial resolution. Cross-validated
results showed the models predicted yield relatively accurately, with a root mean square
error of 0.36 to 0.42 t ha
−1
, and a Lin’s concordance correlation coefcient of 0.89 to
0.92 at the feld resolution. The models performed better as the season progressed, largely
because more information about within-season data became available (e.g. rainfall). The
more years of yield data that were available for a feld, the better the predictions were,
and future work should use a longer time-series of yield data. The generic nature of this
method makes it possible to apply to other agricultural systems where yield monitor data
is available. Future work should also explore the integration of more data sources into the
models, focus on predicting at fner spatial resolutions within felds, and the possibility of
using the yield forecasts to guide management decisions.
Keywords Yield forecast · Empirical yield prediction · Remote sensing · Machine
learning · Random forest · Feature extraction · Precision agriculture