DOI: http://dx.doi.org/10.26483/ijarcs.v8i7.4219
Volume 8, No. 7, July – August 2017
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
© 2015-19, IJARCS All Rights Reserved 253
ISSN No. 0976-5697 ISSN No. 0976-5697
CLASSIFICATION AND CLUSTERING IN YIELD PREDICTION BASED ON SOIL
PROPERTIES
Gurpinder Singh
Department of Computer Engineering
Punjabi University
Patiala,India
Kanwalpreet Singh Atwal (Asst. Prof.)
Department of Computer Engineering
Punjabi University
Patiala, India
Abstract: Data mining in agriculture is becoming a trending subject. Various applications like: pig disease prediction, yield prediction based on
rainfall and temperature, assuring quality of apples etc. incorporate the techniques of data mining. Still there is a gap in study for the sole reason
of predicting the most common but most important content for the farmer i.e Yield Prediction. Prediction of Yield can be influenced by various
factors like: Soil properties, Climate, Seed used and Method of cultivation. In this paper prediction of yield is done by using only the Soil
properties of the soil i.e data mining shows that there are surely some patterns in soil properties which constitute to increase or decrease of the
production of wheat. The soil properties included for this research include Phosphorous, Potassium (K2O), Electrical conductivity, pH value,
Organic carbon and Texture of soil. The Yield prediction was done in two phases. First the pH value was predicted based on the other soil
categories and in Second phase Yield was predicted based on the soil properties including predicted pH. Techniques used are classification and
clustering with some important algorithms.
Keywords: Agricultural Data Mining, Classification, Clustering, Dataset, Random Forest, K-NN, K means.
I. INTRODUCTION
Data Mining is a technique used from extracting useful
information from a large dataset. Data Mining uses many
techniques for evaluating different patterns from a large
amount of data. Data Mining is considered to be a step in
the larger process of Knowledge Discovery from Data
(KDD). KDD is the process of discovering useful
knowledge from data while data mining refers to a
particular step in this process [2]. In data Mining large
datasets relating to any subject/field are first collected
and then all preprocessing is applied. Preprocessing is a
process of transforming or making data appropriate for
applying data mining techniques to it. Preprocessing may
include: cleaning of data, summarization, transformation
etc. Data is transformed into the format required for the
analysis. Data Warehouses are the largest storage units of
data. Historical data relating to any field can be found in
the data warehouse. For example; a bank ABC has many
branches but has one center or headquarter. Similarly,
operational data is stored in each branch’s storage unit
but historical data from each branch is collected and
stored in one centralized unit called a data warehouse. So
that in future any kind of data analysis can be applied to
the data.
Data Mining incorporates many techniques like:
clustering, classification, machine learning, Support
Vector Machines, Regression, Association Rules etc.
Further these techniques can be applied on the dataset by
different algorithms. An overview of these different
techniques is shown in the figure 1.1.
Data Mining in Agriculture is an emerging area and
attracting many data analysts and data mining experts to
focus their studies on it. Summary information about
crop production can help the farmers identify the crop
losses and prevent it in future [3].
Fig. 1.1 A schematic representation of the classification
of the data mining techniques discussed.[1]
Many other problems can be formulated in this field
which when solved can help farmers in decision making
and managing their crops efficiently. Data mining in
agriculture can give farmers information about various
future risks and hazards. For making more suitable
systems for decision making, data mining can be used.
Today, different areas are using data mining, for example
financial data collected from banking and financial
industries are often comparatively absolute, reliable, and
of high quality, which helps methodical data analysis and
data mining. It is used extensively in the retail industry
because it collects huge amount of data on customer
shopping trends, sales of the company etc. This helps the