451 Communication and Computing Systems – Prasad et al. (Eds) © 2017 Taylor & Francis Group, London, ISBN 978-1-138-02952-1 Classification of Pima indian diabetes dataset using naive bayes with genetic algorithm as an attribute selection Dilip Kumar Choubey, Sanchita Paul & Santosh Kumar CSE, BIT, Mesra, Ranchi, India Shankar Kumar Polytechnic, BIT, Mesra, Ranchi, India ABSTRACT: Diabetes means blood sugar is above desired level on a sustained basis. The prime objec- tive of this research work is to provide a better classification of diabetes. There are already several existing method, which have been implemented for the classification of diabetes dataset. In medical sector, the clas- sifications systems have been widely used to exploit the patient’s data and make the predictive models or build set of rules. In this manuscript firstly NBs used for the classification on all the attributes and then GA used as an attribute selection and NBs used on that selected attribute for classification. The experimental results show the performance of this work on PIDD and provide better classification for diagnosis. 2.1 Used diabetes disease dataset The Pima Indian Diabetes Dataset (PIDD) has been taken from the UCI Machine Learning repository. The same dataset used in the refer- ence (Polat and Gunes 2007; Seera and Lim 2014; Lukka 2011; Gajni and Abadeh 2011; Choubey and Paul 2016; Ephzibah 2011; Choubey and Paul 2015). 1 INTRODUCTION Diabetes is a problem and a major public health challenge worldwide. This is one of the most wide- spread disease, now a day’s very common. In this manuscript, Genetic Algorithm (GA) has been used as an attribute (feature) selection method by which four attributes have been selected from eight attributes. Naive Bayes (NBs) are statistical, super- vised learning method for classification. Here, NBs has been used for the classification of the diabetes diagnosis. The paper is organized as follows: Proposed methodology is discussed in section 2, Results and Discussion are devoted to section 3, Conclusion and Future Direction are discussed in section 4. 2 PROPOSED METHODOLOGY Here, the proposed methodology is implemented by GA as an Attribute Selection and NBs for Clas- sification on PIDD which has been taken from UCI machine learning repository. The block diagram of proposed approach is shown above and next proposed approach is as follows: 1. The PIDD has been taken from UCI machine learning repository. 2. Apply GA as an Attribute Selection on PIDD. 3. Do the Classification by using NBs on selected attributes and all the attributes in PIDD. Figure 1. Proposed system. I Pima Indian Diabetes Dataset that has 8 attributes (features). 1 I Features Selection Technique using GA. 1 Naive Bayes (NBs). 1 Classification of Diabetes Disease Dataset that has 4 attributes with class "0" or "1".