International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6616
A Machine Learning Approach to Crop Yield Prediction
Aditya S Sreerama
1
, Dr. B. M. Sagar
2
1
B.E Student,
2
Head of Department
1,2
Dept. of Information Science and Engineering, RV College of Engineering
®
, Bangalore, Karnataka, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Yield prediction benefits the farmers in
reducing their losses and to get best prices for their crops. In
our current times, owing to unforeseeable climate change,
farmers are unable to achieve a reasonable amount of crop
production. In order to feed the World’s growing population,
it is important to integrate new and innovative technologies
and resources in the agricultural sector. This Study Focuses
on training machine learning models to predict the crop
production of the world’s most popular crops grown.
Factors such as Rainfall, Temperature and Pesticide Input
are considered in predicting the crop yield. We compare the
accuracy of regression models such as Decision Tree
Regressor, Gradient Boosting Regressor, Random Forest
Regressor.
Key Words: Crop Yield Prediction; Regression; Machine
Learning
1. INTRODUCTION
Agriculture is one of the most significant factors in the
growth of the developing countries such as India where
the agricultural ecosystem contributes to about 17-18% of
the country’s GDP. Agriculture and related industries
employ more than 70% of the nation’s population and
thus is a key source of survival for many. Agriculture also
plays a crucial role in the global economy. With the
continued expansion of human population awareness of
global crop yields is essential to resolving food security
issues and reducing the effects of climate change. Crop
yield forecasting is an important agricultural problem.
Policy makers depend on accurate predictions to pass
legislations on import and export policies to strengthen
national food security. Farmers also benefit from accurate
predictions by making informed strategic management
and financial decisions.
Agricultural yield depends primarily on weather
conditions such as rain, temperature, etc. and
environmental conditions such as Soil Quality, pesticides
etc. Accurate knowledge on the history of crop yields is
critical for decision-making on agricultural risk
management and future predictions.
Although cuisine varies greatly across the world, the
essential ingredients that support humans are very
similar. The World consumes a lot of maize, wheat, rice
and other basic crops. In this study, machine learning
approaches are used to forecast the 10 most consumed
crops using publicly accessible data from the Food and
Agriculture Organization (FAO) and the World Data Bank.
Crop Yield Predicting can be extremely challenging due to
the highly varying, non-linear and complex factors that
affect it. Added to this, agricultural data is not always
collected consistently over large periods of time. It is also
very common to find unorganized and incomplete data. In
recent times, with increased accessibility to machine
learning algorithms, it has become a more reasonable
challenge to face. Some of the models that can be used for
this kind of prediction include multivariate regression,
decision trees, association rule mining and artificial Neural
Networks to mention a few.
2. OVERVIEW OF REGRESSION ANALYSIS
Regression Analysis comprise of techniques which leverage
a statistical approach to estimate the relationship between
dependent variables (also called the ‘outcome variable’,
which is the crop yield in our study) and independent
variables (also called ‘predictors’ or ‘covariates’, which
include weather and environmental conditions such as
rain, temperature and pesticide usage in our study) in
which the data analyst aims to find a line or other complex
linear relationship that fits the given data according to a
certain mathematical criterion in a way that does not over
fit or under fit the given data. Regression analysis is
primarily used for prediction and forecasting in the field of
machine learning.
In this study we will compare the accuracy provided by
different regression models in predicting crop yield. We
measure with a metric called the R2 score. The R2 is a
statistical measure which assesses the proportion of the
variation in a dependent variable that can be explained by
independent variables in a given regression model. The R2
value lies between 0 and 1 where 1 suggests that 100% of
the variation in the dependent variable can be explained by
the variation in the independent variables.
2.1 Decision Tree Regressor
Decision Tree regressor model is a method commonly used
in data mining applications. The aim of the model is to
predict the value of a dependent variable based on several
independent variables.
The Decision tree iteratively makes decisions on the value
of a particular independent variable and continually
classifies the dependent variable to make prediction easier.
Each internal node of the tree asks a simple question about
the value of a certain input feature. Based on the possible