International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 10 | Oct -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1664
A Review on Naive Baye’s ȋNBȌ, J48 and K-Means Based Mining
Algorithms for Medical Data Mining
Rajbir Kaur
1
, Rakesh Gangwar
2
,
1
M.Tech Scholar, Department of Computer Science & Engineering
Beant College of Engineering and Technology, Gurdaspur, Punjab, India
2
Associate Professor, Department of Computer Science & Engineering
Beant College of Engineering and Technology, Gurdaspur, Punjab, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Data mining can be defined as discovery of
meaningful patterns of large quantity of data and it analyze
and explore to data . This paper studies various data mining
techniques for improve accuracy rate for prediction of various
diseases. This paper reviews the techniques and various
evaluation techniques that describe and distinguish various
kind of techniques for detection of diseases and their
treatment in medical data mining.
Key Words: Data Mining techniques, Naive bayes, ANN,
KNN
1. INTRODUCTION
Data Mining is one of the very motivating and critical part of
study with desire to of removing data from significant
amount of accumulated information sets..An transformative
route has been experienced in the repository market in the
progress of the next functionalities information col lection
and repository formation, information management
(including information storage and collection, and repository
purchase processing), and information analysis and
understanding (involving data warehousing and data
mining). Merely said, information mining refers to removing
or \mining" know corner from big amounts of data.We've
been collecting a myriad of information, from easy exact
proportions and text documents, to more complicated data
such as for instance spatial information, multimedia
programs, and hypertext documents. Information Mining,
also generally known as Knowledge Discovery in
Sources (KDD), refers to the nontrivial extraction of implicit,
previously as yet not known and possibly of good use data
from information in databases. While data mining and
understanding discovery in sources (or KDD) are often
treated as synonyms, data mining is clearly part of the
understanding discovery method
2. DATA MINING TECHNIQUES
It is the process of turning raw data into useful information
so that various pattern can be extracted. Various researchers
have studied and work on data mining techniques to
evaluate and classify the diseases for medical data
2.1 ANN (Artificial Neural Network)
ANN is a classification model which is grouped by
interconnected nodes. It can be viewed as a circular node
which is represented as an artificial neuron that reveals the
output of one neuron to the input of another. The ANN model
is helpful in revealing the hidden relationships in the
historical data, thus facilitating the prediction and
forecasting of diseases of patients.ANN model is accurate
enough to make important and relevant decisions regarding
data usage.
2.2 NAIVE BAYES
Naïve Bayes is a classification technique which is based on
probability theories which fully embody the characteristics of
data of medical science. Bayes model is easy to use for very
large datasets. In simple terms, a Naive Bayes assumed that
the value of a particular feature does not related to the
presence or absence of any other feature, given in the class
variable. It undergoes through following steps:
a) Extract, clean and classify the symptoms of
diseases.
b) Remove large punctuations and split them.
c) Counting Tokens and calculating the probability.
This probability is called as posterior probability
which is calculated by the formula described in.
d) Adding the probabilities and then wrapping up.
2.3 DECISION TREE
Decision tree is one of the predictive modeling technique
used in data mining. It aids to divide the larger dataset into
smaller dataset indicating a parent-child relationship. Each
internal node is labeled with an input feature. Different kind
of attribute test are express by internal nodes, test result are
represent by bifurcations and nodes of leaf express
classification of that type. Decision tree can handle both
numerical and categorical data. It is well suited with large
datasets. Higher accuracy in decision tree classification
technique depicts that the technique can simulate. Decision
tree is able to deal and handle large quantity of input data
such as text with numeric data only textual or nominal. It is a