International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 3966
GDPS - General Disease Prediction System
Shratik J. Mishra
1
, Albar M. Vasi
2
, Vinay S. Menon
3,
Prof. K. Jayamalini
4
1,2,3,4
Department of Computer Engineering, Shree L.R. Tiwari college of Engineering, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The successful application of data mining in
highly visible fields like e-business, commerce and trade has
led to its application in other industries. The medical
environment is still information rich but knowledge weak.
There is a wealth of data possible within the medical systems.
However, there is a lack of powerful analysis tools to identify
hidden relationships and trends in data. Disease is a term that
assigns to a large number of heath care conditions related to
the body. These medical conditions describe the unexpected
health conditions that directly control all the body parts.
Medical data mining techniques like association rule mining,
classification, clustering is implemented to analyze the
different kinds of general body based problems. Classification
is an important problem in data mining. A number of popular
classifiers construct decision trees to generate class models.
The data classification is based on ID3 Decision Tree
algorithm which result in accuracy, the data is estimated using
entropy based cross validations and partition techniques and
the results are compared.
Key Words: Disease, prediction, machine-learning, data
mining, ID3.
1. INTRODUCTION
It is estimated that more than 70% of people in India are
prone to general body diseases like viral, flu, cough, cold .etc,
in every 2 months. Because many people don't realize that
the general body diseases could be symptoms to something
more harmful, 25 % of the population succumbs to death
because of ignoring the early general body symptoms. This
could be a dangerous situation for the population and can be
are alarming. Hence identifying or predicting the disease at
the earliest is very important to avoid any unwanted
casualties. The currently available systems are the systems
that are either dedicated to a particular disease or are in
research phase for algorithms when it comes to generalized
disease.
The purpose of this system is to provide prediction for the
general and more commonly occurring disease that when
unchecked can turn into fatal disease. The system applies
data mining techniques and ID3 decision tree algorithms.
This system will predict the most possible disease based on
the given symptoms and precautionary measures required
to avoid the aggression of disease, it will also help the
doctors analyse the pattern of presence of diseases in the
society. In this project, the disease prediction system will
carry out data mining in its preliminary stages, the system
will be trained using machine learning and data mining.
The paper is divided into five sections. The first section gives
a brief introduction of about the system. The second section
is about data mining and the study of related existing
systems. The third section details out the implementation of
the system. The fourth section provides the results obtained
using mining algorithms. Finally the conclusion gives the
summary and future scope about the system.
2. LITERATURE REVIEW
Here we will elaborate the aspects like the literature survey
of the project and what all projects are existing and been
actually used in the market which the makers of this project
took the inspiration from and thus decided to go ahead with
the project covering with the problem statement.
2.1 Existing Systems
The authors of this project, Narander Kumar and Sabita
Khatri [1], have researched and made comparisons of
different algorithms such as k-NN, Naïve Bayes, Random
Forest, J48, using performance measures like ROC, kappa
statistics, RMSE and MAE in WEKA tools, and also compared
the classifiers on various accuracy measures. The conclusion
reached of this research was that Random Forest has better
accuracy for chronic kidney dataset that was used.
In this project the authors, Monika Gandhi and Dr. Shailendra
Singh [2], have analyzed different data mining algorithms like
Naïve Bayes, Neural network and decision tree algorithms for
their accuracy on prediction of Heart Disease.
The authors Marija Sultana, Afrin Haider and Md.Shorif Uddin
[3], have analyzed algorithms such as K-star, J48, SMO, Bayes
Net and Multilayer Perceptron Network using WEKA tools for
heart disease prediction dataset. The performance of these
datamining techniques in acquired by combination of results
of measures such as predictive accuracy, ROC curve and AUC
value. The result obtained is the SMO and Bayes network
show more optimum result than their other mentioned
counterparts.
In this project, the authors Girija D.K, Dr. M.S. Shashidhara
and M.Giri [4], make use of Neural networks to make
predictions regarding presence of uterine fibroid disease. The
experimental results show an accuracy of 98% using the
Multilayer perceptron neural network and data mining.
This project focuses on the most common form of cancer
present in women i.e. Breast Cancer and its recurrence. The
authors Uma Ojha and Dr. Savita Goel, in this project have
researched on many data mining algorithms in both