An Overview of Data Mining Techniques
Applied for Heart Disease Diagnosis and
Prediction
Salha M. Alzahani, Afnan Althopity, Ashwag Alghamdi, Boushra Alshehri, and Suheer Aljuaid
Dept. of Computer Science, College of Computers and Information Technology
Taif University, Taif, Saudi Arabia
Email: s.zahrani@tu.edu.sa
Abstract — Data mining techniques have been applied
magnificently in many fields including business, science, the
Web, cheminformatics, bioinformatics, and on different
types of data such as textual, visual, spatial, real-time and
sensor data. Medical data is still information rich but
knowledge poor. There is a lack of effective analysis tools to
discover the hidden relationships and trends in medical data
obtained from clinical records. This paper reviews the state-
of-the-art research on heart disease diagnosis and
prediction. Specifically in this paper, we present an
overview of the current research being carried out using the
data mining techniques to enhance heart disease diagnosis
and prediction including decision trees, Naive Bayes
classifiers, K-nearest neighbour classification (KNN),
support vector machine (SVM), and artificial neural
networks techniques. Results show that SVM and neural
networks perform positively high to predict the presence of
coronary heart diseases (CHD). Decision trees after features
reduction is the best recommended classifier to diagnose
cardiovascular disease (CVD). Still the performance of data
mining techniques to detect coronary arteries diseases (CAD)
is not encouraging (between 60%-75%) and further
improvements should be pursued.
Index Terms—heart disease, data mining, decision tree,
naive bayes, K-nearest neighbor, support vector machine
I. INTRODUCTION
Knowledge discovery in data is defined as: “the
extraction of hidden previously unknown and potentially
useful information about data" [1]. Basically knowledge
discovery in data is the process of extracting different
features from data in various steps. Fig.1 shows the
process of Knowledge discovery from various data
sources in a specific domain. Data mining is the heart
(core) step, which results in the discovery of implicit but
potentially valuable knowledge from huge amount of data.
Data mining technology provides the user with the
methods to find new and implicit patterns from massive
data. In the healthcare domain, discovered knowledge can
be used by the healthcare administrators and medical
physicians to improve the accuracy of diagnosis, to
enhance the goodness of surgical operations and to reduce
Manuscript received July 18, 2014; revised December 15, 2014.
the harmful effects of drug [2], [3]. It aims also to propose
less expensive therapeutic [4].
Figure 1. Process of knowledge discovery in data.
The diagnosis of diseases is a difficult but critical task
in medicine. The detection of heart disease from “various
factors or symptoms is a multi-layered issue which is not
free from false presumptions often accompanied by
unpredictable effects” [5]. Thus, we can use patients’ data
that have been collected and recorded to ease the
diagnosis process and utilize knowledge and experience of
numerous specialists dealt with the same symptoms of
diseases. Providing invaluable services with less costs is a
major constraint by the healthcare organizations (hospitals,
polyclinics, and medical centres). According to [6],
“valuable quality service denotes the accurate diagnosis of
patients and providing efficient treatment. Poor clinical
decisions may lead to disasters and hence are seldom
entertained”. Besides, it is essential that the hospitals
decrease the cost of clinical tests. Using professional and
expert computerized systems based on machine-learning
and data mining methods should help in one direction or
another with achieving clinical tests or diagnosis at
reduced risks [7], [8].
This paper aims to provide a survey of current
techniques of knowledge discovery using data mining
techniques applied to medical research; particularly, to
heart disease prediction. Literature studies between 2010
and 2014 are discussed, unless a significant study before
that should be mentioned. A number of experiments and
research works have been done to compare the
performance of predictive data mining techniques like
decision tree, Naive Bayes, K-nearest neighbour, support
vector machine and artificial neural networks. This paper
discussed the results of the state-of-the-art techniques and
gives conclusions towards future research.
310 ©2014 Engineering and Technology Publishing
Lecture Notes on Information Theory Vol. 2, No. 4, December 2014
doi: 10.12720/lnit.2.4.310-315