IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 87-91 www.iosrjournals.org DOI: 10.9790/0661-1803048791 www.iosrjournals.org 87 | Page Heart Disease Detection using EKSTRAP Clustering with Statistical and Distance based Classifiers Terence Johnson 1 , Dr. S. K. Singh 2 , Vaishnavi Kamat 3 , Aishwarya Joshi 4 , Lester D‟Souza 4 , Poohar Amonkar 4 , Devyani Joshi 4 , Anirudha Kulkarni 4 1 (Ph.D Scholar, Information Technology Department, AMET University, India) 1 (Asst. Prof., Computer Engineering Department, AITD, Goa University, India) 2 (Head, Information Technology Department, TCSC, Mumbai University, India) 3 (Asst. Prof., Computer Engineering Department, AITD, Goa University, India) 4 (Students, Computer Engineering Department, AITD, Goa University, India) Abstract : The heart is the most important organ in the human body which pumps blood to various parts of the body. If there is inefficient circulation of blood in body organs like brain will suffer. If heart stops pumping blood it results in death. An individual’s life is very much dependent on how efficiently the heart works. Using data mining technique proposed in this paper we are trying to detect if a patient has heart disease or not. The system uses 13 attributes like age, gender, blood pressure, cholesterol etc to detect the same. The system uses a hybrid technique which uses Enhanced K STRAnge Points(EKSTRAP) clustering algorithm , output of which is given to different classifiers like statistical –Naïve Bayes classifier and Distance Based – MSDC (Modified Simple Distance Classifier ). Keywords: Enhanced K Strange, clustering, Heart Disease, Naïve Bayes Classifier I. Introduction When we have large pre-existing dataset, we can examine it and generate new information previously not observed by processing the data. For processing we can use different data mining techniques, in this paper clustering and classification techniques are used [1]. Clustering is an unsupervised process of grouping similar objects from a given dataset. The similarity is determined using many techniques like Euclidian Distance [2]. Clustering is achieved by the Enhanced k strange points clustering algorithm [3]. Classification on the contrary is a supervised technique to check, to which group the point belongs to given the groups. Statistical classification [4] and Distance based classification [5] techniques are used. Naïve Bayes [6], a simple probabilistic classifier is the statistical classifier used and the Modified Simple Distance classifier is the Distance based classifier used in this paper. Heart Disease Detection [7] tells whether the new data object has heart disease or not based on the training data given to the classifier. The training data is obtained by clustering. The dataset used is cleveland dataset which is taken from the UCI Repository. The block diagram below explains the flow of the implemented system. Entire dataset is grouped into classes using clustering algorithm. The output of the clustering algorithm along with the new tuple is given to the classifier which then detects to which class the new tuple belongs. Fig.1.Block Diagram The clustering and classification is done based on the following attributes:- 1. Age-Age in years. 2. Gender- a. 1 is male b. 0 is female. 3. Cp –Chest Pain type a. value 1-Typical angina b. value 2-Atyical angina c. value 3-Non Anginal pain d. value 4 -Asymptomatic