International Journal of Computer Applications (0975 8887) Volume 54No.17, September 2012 16 A Novel Approach for Heart Disease Diagnosis using Data Mining and Fuzzy Logic Nidhi Bhatla Kiran Jyoti GNDEC, Ludhiana, India GNDEC, Ludhiana, India ABSTRACT Cardiovascular disease is a term used to describe a variety of heart diseases, illnesses, and events that impact the heart and circulatory system. A clinician uses several sources of data and tests to make a diagnostic impression but it is not necessary that all the tests are useful for the diagnosis of a heart disease. The objective of our work is to reduce the number of attributes used in heart disease diagnosis that will automatically reduce the number of tests which are required to be taken by a patient. Our work also aims at increasing the efficiency of the proposed system. The observations illustrated that Decision Tree and Naive Bayes using fuzzy logic has outplayed over other data mining techniques. Keywords Cardiovascular disease; data mining; fuzzy logic; weka tool; decision tree; naive bayes; classification via clustering. 1. INTRODUCTION WHO report Global Atlas on cardiovascular disease prevention and control states that cardiovascular disease (CVDs) are the leading causes of death and disability in the world. Although a large proportion of CVDs is preventable, they continue to rise mainly because preventive measures are inadequate. Clinical problem solving or diagnostic reasoning is the skill that physicians use to understand a patient’s complaints and then to identify a short, prioritized list of possible diagnoses that could account for those complaints. This differential diagnosis then drives the choice of diagnostic tests and possible treatments. Despite striking advances in information technology, clinical problem solving has not yet been effectively replicated by computers, making it essential that clinicians work to develop expertise in this very important skill set. Hence, more adequate systems for diagnosis of cardiovascular disease need to be developed. Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. In today’s era, data mining has its successful application in various fields including healthcare. On the other hand, fuzzy logic provides a simple way to arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing input information. Our work attempts to incorporate both the above mentioned techniques for the development of the proposed system and to increase its efficiency. 2. RELATED WORK Ample number of systems such as information systems, Decision Support Systems, Image and Scan processing systems in healthcare sector has been deployed for effective diagnosis of various diseases. Our work is an endeavour to predict accurately the presence of cardiac disease with reduced number of attributes. P.K. Anooj (2012) [1] developed Clinical Decision Support System for heart disease using weighted Fuzzy Rules. E.P. Ephzibah et al (2012) [2] framed Fuzzy Rules for Heart Disease diagnosis using 6 attributes. Sulabha S. Apte et al (2012) [4] compared various data classification techniques by using 15 attributes for heart disease diagnosis. M. Anbarasi et al (2010) [5] developed an Enhanced Prediction System for heart disease with feature subset selection using Genetic Algorithm. Moreover, three classifiers Decision Tree, Naive Bayes and Classification via Clustering have been used and Decision Tree performed with good prediction probability of 99.2%. B.Patil et al (2009) [11] used Artificial Neural Network for developing heart disease prediction system. Carlos (2006) [19] compared Association Rules and Decision Trees for disease prediction. The rest of the sections are classified in the following manner. Section 3 explains the data set used. Section 4 discusses about developing the heart disease prediction system using fuzzy logic. Section 5 illustrates the classification process and outcomes. Section 6 exhibits the efficiency of the proposed system. 3. DATA SET In our work, six attributes have been reduced to four attributes which are employed for heart disease prediction. The data of various patients is entered in the proposed system and the diagnosed results generated by the system corresponding to patients have been saved in the database. The resultant data set thus obtained is used by the classification model for calculating the efficiency of the proposed system. Attributes have been converted to categorical form for more clarity [5]. Moreover, training set method is used as the test mode. Fig 1: Attributes list Fig. 1 illustrates the original list of attributes and fig. 2 illustrates the reduced set of attributes. Input Attributes: 1. Type - Chest Pain Type 2. Rbp - Resting blood pressure 3. Eia - Exercise induced angina 4. Oldpk - Old peak 5. Vsl - No. of vessels colored 6. Thal -Maximum heart rate achieved