451
Communication and Computing Systems – Prasad et al. (Eds)
© 2017 Taylor & Francis Group, London, ISBN 978-1-138-02952-1
Classification of Pima indian diabetes dataset using naive bayes
with genetic algorithm as an attribute selection
Dilip Kumar Choubey, Sanchita Paul & Santosh Kumar
CSE, BIT, Mesra, Ranchi, India
Shankar Kumar
Polytechnic, BIT, Mesra, Ranchi, India
ABSTRACT: Diabetes means blood sugar is above desired level on a sustained basis. The prime objec-
tive of this research work is to provide a better classification of diabetes. There are already several existing
method, which have been implemented for the classification of diabetes dataset. In medical sector, the clas-
sifications systems have been widely used to exploit the patient’s data and make the predictive models or
build set of rules. In this manuscript firstly NBs used for the classification on all the attributes and then GA
used as an attribute selection and NBs used on that selected attribute for classification. The experimental
results show the performance of this work on PIDD and provide better classification for diagnosis.
2.1 Used diabetes disease dataset
The Pima Indian Diabetes Dataset (PIDD) has
been taken from the UCI Machine Learning
repository. The same dataset used in the refer-
ence (Polat and Gunes 2007; Seera and Lim 2014;
Lukka 2011; Gajni and Abadeh 2011; Choubey
and Paul 2016; Ephzibah 2011; Choubey and Paul
2015).
1 INTRODUCTION
Diabetes is a problem and a major public health
challenge worldwide. This is one of the most wide-
spread disease, now a day’s very common. In this
manuscript, Genetic Algorithm (GA) has been
used as an attribute (feature) selection method by
which four attributes have been selected from eight
attributes. Naive Bayes (NBs) are statistical, super-
vised learning method for classification. Here, NBs
has been used for the classification of the diabetes
diagnosis.
The paper is organized as follows: Proposed
methodology is discussed in section 2, Results and
Discussion are devoted to section 3, Conclusion
and Future Direction are discussed in section 4.
2 PROPOSED METHODOLOGY
Here, the proposed methodology is implemented
by GA as an Attribute Selection and NBs for Clas-
sification on PIDD which has been taken from
UCI machine learning repository.
The block diagram of proposed approach is
shown above and next proposed approach is as
follows:
1. The PIDD has been taken from UCI machine
learning repository.
2. Apply GA as an Attribute Selection on PIDD.
3. Do the Classification by using NBs on selected
attributes and all the attributes in PIDD.
Figure 1. Proposed system.
I
Pima Indian Diabetes Dataset that has 8 attributes (features).
1
I
Features Selection Technique using GA.
1
Naive Bayes (NBs).
1
Classification of Diabetes Disease Dataset that has 4 attributes
with class "0" or "1".