AI 2023, 4, 1–15. https://doi.org/10.3390/ai4010001 www.mdpi.com/journal/ai
Article
Data Synthesis for Alfalfa Biomass Yield Estimation
Jonathan Vance
1
, Khaled Rasheed
1,2,
*, Ali Missaoui
3
and Frederick W. Maier
2
1
School of Computing, University of Georgia, 415 Boyd Graduate Studies, 200 D. W. Brooks Drive,
Athens, GA 30602, USA
2
Institute for Artificial Intelligence, University of Georgia, 515 Boyd Graduate Studies, 200 D. W. Brooks
Drive, Athens, GA 30602, USA
3
Department of Crop and Soil Sciences, Institute of Plant Breeding Genetics and Genomics,
University of Georgia, 4317 Miller Plant Science, Athens, GA 30602, USA
* Correspondence: khaled@uga.edu
Abstract: Alfalfa is critical to global food security, and its data is abundant in the U.S. nationally,
but often scarce locally, limiting the potential performance of machine learning (ML) models in pre-
dicting alfalfa biomass yields. Training ML models on local-only data results in very low estimation
accuracy when the datasets are very small. Therefore, we explore synthesizing non-local data to
estimate biomass yields labeled as high, medium, or low. One option to remedy scarce local data is
to train models using non-local data; however, this only works about as well as using local data.
Therefore, we propose a novel pipeline that trains models using data synthesized from non-local
data to estimate local crop yields. Our pipeline, synthesized non-local training (SNLT pronounced
like sunlight), achieves a gain of 42.9% accuracy over the best results from regular non-local and
local training on our very small target dataset. This pipeline produced the highest accuracy of 85.7%
with a decision tree classifier. From these results, we conclude that SNLT can be a useful tool in
helping to estimate crop yields with ML. Furthermore, we propose a software application called
Predict Your CropS (PYCS pronounced like Pisces) designed to help farmers and researchers esti-
mate and predict crop yields based on pretrained models.
Keywords: machine learning; data synthesis; generative models; alfalfa; biomass; precision
agriculture; classification; climate change; yield prediction; deep learning
1. Introduction
The alfalfa crop is an important livestock feed and is crucial to global food security.
In previous work, we used climate data to estimate alfalfa biomass yields. We compared
the accuracies of feature selection techniques and machine learning (ML) models for this
task. We obtained promising results using local training data with R
2
values over 0.90, as
we had access to rich curated datasets from state university variety trials [1]. However,
since our team is developing a software application to aid real-world farmers, whose da-
tasets may be much smaller, the current work addresses the problem of estimating yields
for very small target datasets. We find that local training on very small target datasets
results in very low accuracy, while, non-local training on much larger datasets performs
only about as well as local training. Our solution combines ideas inspired by [2], which
shows success using pretrained models and sparse datasets, with ideas inspired by [3,4],
which show the promise of deep learning generative models like the adversarial autoen-
coder (AAE) [3] and generative adversarial networks (GANs) [4]. We propose a novel
pipeline where models are trained with data generated or synthesized by other deep
learning (DL) models. In this pipeline, the synthesized training data are synthesized from
non-local sources, and the resulting classifiers estimate local targets. We call this synthe-
sized non-local training (SNLT pronounced like sunlight), and we show it consistently
achieves better accuracy than both local and non-local training. We extend the work of Xu
Citation: Vance, J.; Rasheed, K.;
Missaoui, A.; Maier, F.W. Data
Synthesis for Alfalfa Biomass Yield
Estimation. AI 2023, 4, 1–15. https://
doi.org/10.3390/ai4010001
Academic Editor: Arslan Munir
Received: 12 November 2022
Revised: 29 November 2022
Accepted: 30 November 2022
Published: 21 December 2022
Copyright: © 2022 by the authors. Li-
censee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and con-
ditions of the Creative Commons At-
tribution (CC BY) license (https://cre-
ativecommons.org/licenses/by/4.0/).