International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6616 A Machine Learning Approach to Crop Yield Prediction Aditya S Sreerama 1 , Dr. B. M. Sagar 2 1 B.E Student, 2 Head of Department 1,2 Dept. of Information Science and Engineering, RV College of Engineering ® , Bangalore, Karnataka, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Yield prediction benefits the farmers in reducing their losses and to get best prices for their crops. In our current times, owing to unforeseeable climate change, farmers are unable to achieve a reasonable amount of crop production. In order to feed the World’s growing population, it is important to integrate new and innovative technologies and resources in the agricultural sector. This Study Focuses on training machine learning models to predict the crop production of the world’s most popular crops grown. Factors such as Rainfall, Temperature and Pesticide Input are considered in predicting the crop yield. We compare the accuracy of regression models such as Decision Tree Regressor, Gradient Boosting Regressor, Random Forest Regressor. Key Words: Crop Yield Prediction; Regression; Machine Learning 1. INTRODUCTION Agriculture is one of the most significant factors in the growth of the developing countries such as India where the agricultural ecosystem contributes to about 17-18% of the country’s GDP. Agriculture and related industries employ more than 70% of the nation’s population and thus is a key source of survival for many. Agriculture also plays a crucial role in the global economy. With the continued expansion of human population awareness of global crop yields is essential to resolving food security issues and reducing the effects of climate change. Crop yield forecasting is an important agricultural problem. Policy makers depend on accurate predictions to pass legislations on import and export policies to strengthen national food security. Farmers also benefit from accurate predictions by making informed strategic management and financial decisions. Agricultural yield depends primarily on weather conditions such as rain, temperature, etc. and environmental conditions such as Soil Quality, pesticides etc. Accurate knowledge on the history of crop yields is critical for decision-making on agricultural risk management and future predictions. Although cuisine varies greatly across the world, the essential ingredients that support humans are very similar. The World consumes a lot of maize, wheat, rice and other basic crops. In this study, machine learning approaches are used to forecast the 10 most consumed crops using publicly accessible data from the Food and Agriculture Organization (FAO) and the World Data Bank. Crop Yield Predicting can be extremely challenging due to the highly varying, non-linear and complex factors that affect it. Added to this, agricultural data is not always collected consistently over large periods of time. It is also very common to find unorganized and incomplete data. In recent times, with increased accessibility to machine learning algorithms, it has become a more reasonable challenge to face. Some of the models that can be used for this kind of prediction include multivariate regression, decision trees, association rule mining and artificial Neural Networks to mention a few. 2. OVERVIEW OF REGRESSION ANALYSIS Regression Analysis comprise of techniques which leverage a statistical approach to estimate the relationship between dependent variables (also called the ‘outcome variable’, which is the crop yield in our study) and independent variables (also called ‘predictors’ or ‘covariates’, which include weather and environmental conditions such as rain, temperature and pesticide usage in our study) in which the data analyst aims to find a line or other complex linear relationship that fits the given data according to a certain mathematical criterion in a way that does not over fit or under fit the given data. Regression analysis is primarily used for prediction and forecasting in the field of machine learning. In this study we will compare the accuracy provided by different regression models in predicting crop yield. We measure with a metric called the R2 score. The R2 is a statistical measure which assesses the proportion of the variation in a dependent variable that can be explained by independent variables in a given regression model. The R2 value lies between 0 and 1 where 1 suggests that 100% of the variation in the dependent variable can be explained by the variation in the independent variables. 2.1 Decision Tree Regressor Decision Tree regressor model is a method commonly used in data mining applications. The aim of the model is to predict the value of a dependent variable based on several independent variables. The Decision tree iteratively makes decisions on the value of a particular independent variable and continually classifies the dependent variable to make prediction easier. Each internal node of the tree asks a simple question about the value of a certain input feature. Based on the possible