An Overview of Data Mining Techniques Applied for Heart Disease Diagnosis and Prediction Salha M. Alzahani, Afnan Althopity, Ashwag Alghamdi, Boushra Alshehri, and Suheer Aljuaid Dept. of Computer Science, College of Computers and Information Technology Taif University, Taif, Saudi Arabia Email: s.zahrani@tu.edu.sa Abstract — Data mining techniques have been applied magnificently in many fields including business, science, the Web, cheminformatics, bioinformatics, and on different types of data such as textual, visual, spatial, real-time and sensor data. Medical data is still information rich but knowledge poor. There is a lack of effective analysis tools to discover the hidden relationships and trends in medical data obtained from clinical records. This paper reviews the state- of-the-art research on heart disease diagnosis and prediction. Specifically in this paper, we present an overview of the current research being carried out using the data mining techniques to enhance heart disease diagnosis and prediction including decision trees, Naive Bayes classifiers, K-nearest neighbour classification (KNN), support vector machine (SVM), and artificial neural networks techniques. Results show that SVM and neural networks perform positively high to predict the presence of coronary heart diseases (CHD). Decision trees after features reduction is the best recommended classifier to diagnose cardiovascular disease (CVD). Still the performance of data mining techniques to detect coronary arteries diseases (CAD) is not encouraging (between 60%-75%) and further improvements should be pursued. Index Terms—heart disease, data mining, decision tree, naive bayes, K-nearest neighbor, support vector machine I. INTRODUCTION Knowledge discovery in data is defined as: “the extraction of hidden previously unknown and potentially useful information about data" [1]. Basically knowledge discovery in data is the process of extracting different features from data in various steps. Fig.1 shows the process of Knowledge discovery from various data sources in a specific domain. Data mining is the heart (core) step, which results in the discovery of implicit but potentially valuable knowledge from huge amount of data. Data mining technology provides the user with the methods to find new and implicit patterns from massive data. In the healthcare domain, discovered knowledge can be used by the healthcare administrators and medical physicians to improve the accuracy of diagnosis, to enhance the goodness of surgical operations and to reduce Manuscript received July 18, 2014; revised December 15, 2014. the harmful effects of drug [2], [3]. It aims also to propose less expensive therapeutic [4]. Figure 1. Process of knowledge discovery in data. The diagnosis of diseases is a difficult but critical task in medicine. The detection of heart disease from “various factors or symptoms is a multi-layered issue which is not free from false presumptions often accompanied by unpredictable effects” [5]. Thus, we can use patients’ data that have been collected and recorded to ease the diagnosis process and utilize knowledge and experience of numerous specialists dealt with the same symptoms of diseases. Providing invaluable services with less costs is a major constraint by the healthcare organizations (hospitals, polyclinics, and medical centres). According to [6], “valuable quality service denotes the accurate diagnosis of patients and providing efficient treatment. Poor clinical decisions may lead to disasters and hence are seldom entertained”. Besides, it is essential that the hospitals decrease the cost of clinical tests. Using professional and expert computerized systems based on machine-learning and data mining methods should help in one direction or another with achieving clinical tests or diagnosis at reduced risks [7], [8]. This paper aims to provide a survey of current techniques of knowledge discovery using data mining techniques applied to medical research; particularly, to heart disease prediction. Literature studies between 2010 and 2014 are discussed, unless a significant study before that should be mentioned. A number of experiments and research works have been done to compare the performance of predictive data mining techniques like decision tree, Naive Bayes, K-nearest neighbour, support vector machine and artificial neural networks. This paper discussed the results of the state-of-the-art techniques and gives conclusions towards future research. 310 ©2014 Engineering and Technology Publishing Lecture Notes on Information Theory Vol. 2, No. 4, December 2014 doi: 10.12720/lnit.2.4.310-315