Identification of Heart Failure by Using Unstructured
Data of Cardiac Patients
Muhammad Saqlain, Wahid Hussain, Nazar A. Saqib, Muazzam A. Khan
College of Electrical and Mechanical Engineering (E& ME)
National University of Sciences and Technology (NUST)
Islamabad, Pakistan
m.saqlain1240@yahoo.com, wahid.hussain.bangash@gmail.com, {nazar.abbas, muazzamak}@ce.ceme.edu.pk
Abstract Heart Failure (HF) occurrence is increasing
day by day and is the leading death cause disease in our
society. HF is among the most expensive diseases as well.
Social and individual burden of this disease can be reduced
by early detection of HF. This would provide the means that
may helpful to slow progression of the disease as well as to
recover patient to good health. In this research study, we
have applied data mining techniques to get useful
information from medical reports of patients and using
machine learning classification algorithm, we propose a risk
model to predict 1-year or more survival for HF diagnosed
patients. To perform multi-class classification we use multi-
nominal Naïve Bayes (NB) classification algorithm. We got
our required data from the Armed Forces Institute of
Cardiology (AFIC), Pakistan, in the form of medical reports
of patients which are available in the structured and
unstructured format. Unfortunately, a lot of information is
buried in unstructured data format. Our proposed model
achieved an accuracy and Area under the Curve (AUC) of
86.7% and 92.4%, respectively.
Keywords Heart Failure; Classification Techniques;
Feature Selection; Naïve Bayes; Survival Analysis
I. INTRODUCTION
Cardiac disease is a major health issue and is the
leading cause of death worldwide. It can be a cause of
serious cardiovascular actions just like stroke and heart
attack. It has been observed in the general community, the
risk of heart failure occurrence in an individual at the age
of 40 years is 1 out of 5 [1]. Nearly 6.6 million adults of
the US reported for HF in 2010, costing health care
expenses of 34.4 billion US dollars [20].
Heart failure risk assessment is very crucial to find
prevention opportunities. The basic steps for heart disease
risk assessment are: identify and track the heart disease
risk factors progression. In these days, the priority of all
major public health care centers is HF patient’s high
mortality rate [2]. Due to lack of an efficient means for HF
prediction, we have observed a very little progress for
controlling the progression of HF. The social and
individual burden can be reduced by early prediction of
HF and by changing lifestyle and by establishing
defensive therapies.
HF is a very heterogeneous and complex disease
which is difficult to detect due to the variety of unusual
signs and symptoms [3]. Some examples of HF risk
factors are, very low Left Ventricular Ejection Fraction
(LVEF), hypertension, diabetes, hyperlipidemia, anemia,
medication, smoking history and family history. An
accurate prediction model for HF can be a very useful for
physicians as well as for patients. On the basis of accurate
risk prediction, a physician can recommend a valid
treatment plan, and patients can follow those treatment
plans more confidently.
Raw data are available in the form of complex reports,
patient’s medical history, and electronics test results [4].
These medical reports are in the form of structured and
unstructured data. There is no problem to use structured
data for risk prediction model. But, there is a lot of
valuable information buried in unstructured data format
because this data is very discrete, complex, multi-
dimensional and noisy [10]. We collect patient’s reports
from a well-known hospital of Pakistan: Armed Forces
Institute of Cardiology (AFIC). The objective of our
research is to mine the useful information from these
reports with the help of cardiologists and researchers and
to design a predictive model that will give us the
prediction of 1-year or more survival for HF patients using
Naïve Bayes (NB) classification model. Our dataset is
time-based, which means we use the data for only those
patients whose final reports were submitted within 1-year
of the time period, either they were survived or not after
the HF diagnose occurrence. Thus, by using this model
we also can define the mortality rate of HF patients in our
society, as well as, it will create a knowledge discovery for
medical practitioners and researchers to predict the
condition of HF patients before their critical situations.
The rest of the paper is organized as follows.
Section II contains the related study by different
researchers. In Section III, we explain our proposed
methodology. Section IV contains the results and analysis
of different classification models. Finally, conclusion and
related work provide the overall summary of this research
in Section V.
2016 45th International Conference on Parallel Processing Workshops
2332-5690/16 $31.00 © 2016 IEEE
DOI 10.1109/ICPPW.2016.66
427
2016 45th International Conference on Parallel Processing Workshops
2332-5690/16 $31.00 © 2016 IEEE
DOI 10.1109/ICPPW.2016.66
426
2016 45th International Conference on Parallel Processing Workshops
2332-5690/16 $31.00 © 2016 IEEE
DOI 10.1109/ICPPW.2016.66
426
2016 45th International Conference on Parallel Processing Workshops
2332-5690/16 $31.00 © 2016 IEEE
DOI 10.1109/ICPPW.2016.66
426