DOI: http://dx.doi.org/10.26483/ijarcs.v8i7.4219 Volume 8, No. 7, July – August 2017 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info © 2015-19, IJARCS All Rights Reserved 253 ISSN No. 0976-5697 ISSN No. 0976-5697 CLASSIFICATION AND CLUSTERING IN YIELD PREDICTION BASED ON SOIL PROPERTIES Gurpinder Singh Department of Computer Engineering Punjabi University Patiala,India Kanwalpreet Singh Atwal (Asst. Prof.) Department of Computer Engineering Punjabi University Patiala, India Abstract: Data mining in agriculture is becoming a trending subject. Various applications like: pig disease prediction, yield prediction based on rainfall and temperature, assuring quality of apples etc. incorporate the techniques of data mining. Still there is a gap in study for the sole reason of predicting the most common but most important content for the farmer i.e Yield Prediction. Prediction of Yield can be influenced by various factors like: Soil properties, Climate, Seed used and Method of cultivation. In this paper prediction of yield is done by using only the Soil properties of the soil i.e data mining shows that there are surely some patterns in soil properties which constitute to increase or decrease of the production of wheat. The soil properties included for this research include Phosphorous, Potassium (K2O), Electrical conductivity, pH value, Organic carbon and Texture of soil. The Yield prediction was done in two phases. First the pH value was predicted based on the other soil categories and in Second phase Yield was predicted based on the soil properties including predicted pH. Techniques used are classification and clustering with some important algorithms. Keywords: Agricultural Data Mining, Classification, Clustering, Dataset, Random Forest, K-NN, K means. I. INTRODUCTION Data Mining is a technique used from extracting useful information from a large dataset. Data Mining uses many techniques for evaluating different patterns from a large amount of data. Data Mining is considered to be a step in the larger process of Knowledge Discovery from Data (KDD). KDD is the process of discovering useful knowledge from data while data mining refers to a particular step in this process [2]. In data Mining large datasets relating to any subject/field are first collected and then all preprocessing is applied. Preprocessing is a process of transforming or making data appropriate for applying data mining techniques to it. Preprocessing may include: cleaning of data, summarization, transformation etc. Data is transformed into the format required for the analysis. Data Warehouses are the largest storage units of data. Historical data relating to any field can be found in the data warehouse. For example; a bank ABC has many branches but has one center or headquarter. Similarly, operational data is stored in each branch’s storage unit but historical data from each branch is collected and stored in one centralized unit called a data warehouse. So that in future any kind of data analysis can be applied to the data. Data Mining incorporates many techniques like: clustering, classification, machine learning, Support Vector Machines, Regression, Association Rules etc. Further these techniques can be applied on the dataset by different algorithms. An overview of these different techniques is shown in the figure 1.1. Data Mining in Agriculture is an emerging area and attracting many data analysts and data mining experts to focus their studies on it. Summary information about crop production can help the farmers identify the crop losses and prevent it in future [3]. Fig. 1.1 A schematic representation of the classification of the data mining techniques discussed.[1] Many other problems can be formulated in this field which when solved can help farmers in decision making and managing their crops efficiently. Data mining in agriculture can give farmers information about various future risks and hazards. For making more suitable systems for decision making, data mining can be used. Today, different areas are using data mining, for example financial data collected from banking and financial industries are often comparatively absolute, reliable, and of high quality, which helps methodical data analysis and data mining. It is used extensively in the retail industry because it collects huge amount of data on customer shopping trends, sales of the company etc. This helps the