I.J. Information Technology and Computer Science, 2016, 11, 26-32
Published Online November 2016 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijitcs.2016.11.04
Copyright © 2016 MECS I.J. Information Technology and Computer Science, 2016, 11, 26-32
A Tool for Diabetes Prediction and Monitoring
Using Data Mining Technique
S. R. Priyanka Shetty
Nitte Meenakshi Institute of Technology/Department of CSE, Bangalore, 560064, India
E-mail: siddamshettypriya@gmail.com
Sujata Joshi
Nitte Meenakshi Institute of Technology/Department of CSE, Bangalore, 560064, India
E-mail: sujata_msrp@yahoo.com
Abstract—Data mining is the process of analyzing
different aspects of data and aggregating it into useful
information. Classification is a data mining task generally
used in medical data mining. The goal here is to discover
new and useful patterns to provide meaningful and useful
information for the users about the diabetes. Here a
diabetes prediction and monitoring system is designed
and implemented using ID3 classification algorithm. The
symptoms causing diabetes are identified and are applied
to the prediction model based on which the prediction is
done. The monitoring module analyzes the laboratory test
reports of the blood sugar levels of the patient and
provides proper awareness messages to the patient
through mail and bar chart.
Index Terms—Data mining, Classification, Decision tree,
ID3, Diabetes dataset, Prediction.
I. INTRODUCTION
A. Data mining
Data mining is the process of extracting hidden
knowledge from large volumes of raw data. It is the
analytical process designed to explore data in search of
consistent patterns and find systematic relationships
between variables. The application areas of data mining
are in field of education system, market basket analysis,
customer relationship management, banking application,
sports and in Health care system.
In recent years medical data mining has become
prominent, since there is enormous amount of medical
data available which can be used for discovering useful
patterns. The data mining techniques such as
classification, clustering, association, outlier analysis help
in finding useful patters from the huge amount of medical
data.
Data mining has great potential for the healthcare
industry since it helps health systems to use medical data
for analysis and to offer improved healthcare at reduced
cost. The data mining techniques when applied to health
care play a significant role in prediction and diagnosis of
various health problems like heart disease, diabetes,
cancer, skin disease and many more.
B. Classification
Data mining includes classification as one of the
fundamental task. Classification is used to predict the
group membership of data instance. Classification is
applied in areas such as weather prediction, medical
diagnosing, scientific experiments etc.
The classification technique is generally used in
medical data mining. The classification techniques
generally used are Decision trees, Bayesian classifier,
Random Forest, Random tree, classification by back-
propagation and rule based classifiers. Classification is
performed in two steps:
Model construction: In this step the prediction model is
built using appropriate algorithm.
Model Usage: In this step the prediction model is
applied to actual data and prediction is done accordingly.
C. Decision Tree
Decision tree is a commonly used technique in data
mining which is used for classification. The decision tree
classifier is built in a top-down manner with root node
and involves partitioning the data into subsets that
contains instance with similar values.
The decision analysis helps to visualize and explicitly
represent decisions and the classification tree helps in
decision making. This algorithm creates a model that
predicts the value of a target variables based on several
input variables.
The decision tree applications in the real-world are
found in field of medical, agriculture, financial analysis,
biometric engineering, plant disease and software
development. The commonly used algorithms using
Decision tree are ID3, C4.5 and CART.
The decision tree algorithm is used widely as it is
simple to understand and it can handle both numeric and
categorical data. It is robust as well and performs well
with large dataset.
D. Diabetes
Diabetes mellitus (DM) is a chronic disease, in which
the person has high blood sugar levels. It affects the
ability of body to use the energy found in food for life
long. Once the body absorbs simple sugar (sucrose) it