Identification of Heart Failure by Using Unstructured Data of Cardiac Patients Muhammad Saqlain, Wahid Hussain, Nazar A. Saqib, Muazzam A. Khan College of Electrical and Mechanical Engineering (E& ME) National University of Sciences and Technology (NUST) Islamabad, Pakistan m.saqlain1240@yahoo.com, wahid.hussain.bangash@gmail.com, {nazar.abbas, muazzamak}@ce.ceme.edu.pk Abstract Heart Failure (HF) occurrence is increasing day by day and is the leading death cause disease in our society. HF is among the most expensive diseases as well. Social and individual burden of this disease can be reduced by early detection of HF. This would provide the means that may helpful to slow progression of the disease as well as to recover patient to good health. In this research study, we have applied data mining techniques to get useful information from medical reports of patients and using machine learning classification algorithm, we propose a risk model to predict 1-year or more survival for HF diagnosed patients. To perform multi-class classification we use multi- nominal Naïve Bayes (NB) classification algorithm. We got our required data from the Armed Forces Institute of Cardiology (AFIC), Pakistan, in the form of medical reports of patients which are available in the structured and unstructured format. Unfortunately, a lot of information is buried in unstructured data format. Our proposed model achieved an accuracy and Area under the Curve (AUC) of 86.7% and 92.4%, respectively. Keywords Heart Failure; Classification Techniques; Feature Selection; Naïve Bayes; Survival Analysis I. INTRODUCTION Cardiac disease is a major health issue and is the leading cause of death worldwide. It can be a cause of serious cardiovascular actions just like stroke and heart attack. It has been observed in the general community, the risk of heart failure occurrence in an individual at the age of 40 years is 1 out of 5 [1]. Nearly 6.6 million adults of the US reported for HF in 2010, costing health care expenses of 34.4 billion US dollars [20]. Heart failure risk assessment is very crucial to find prevention opportunities. The basic steps for heart disease risk assessment are: identify and track the heart disease risk factors progression. In these days, the priority of all major public health care centers is HF patient’s high mortality rate [2]. Due to lack of an efficient means for HF prediction, we have observed a very little progress for controlling the progression of HF. The social and individual burden can be reduced by early prediction of HF and by changing lifestyle and by establishing defensive therapies. HF is a very heterogeneous and complex disease which is difficult to detect due to the variety of unusual signs and symptoms [3]. Some examples of HF risk factors are, very low Left Ventricular Ejection Fraction (LVEF), hypertension, diabetes, hyperlipidemia, anemia, medication, smoking history and family history. An accurate prediction model for HF can be a very useful for physicians as well as for patients. On the basis of accurate risk prediction, a physician can recommend a valid treatment plan, and patients can follow those treatment plans more confidently. Raw data are available in the form of complex reports, patient’s medical history, and electronics test results [4]. These medical reports are in the form of structured and unstructured data. There is no problem to use structured data for risk prediction model. But, there is a lot of valuable information buried in unstructured data format because this data is very discrete, complex, multi- dimensional and noisy [10]. We collect patient’s reports from a well-known hospital of Pakistan: Armed Forces Institute of Cardiology (AFIC). The objective of our research is to mine the useful information from these reports with the help of cardiologists and researchers and to design a predictive model that will give us the prediction of 1-year or more survival for HF patients using Naïve Bayes (NB) classification model. Our dataset is time-based, which means we use the data for only those patients whose final reports were submitted within 1-year of the time period, either they were survived or not after the HF diagnose occurrence. Thus, by using this model we also can define the mortality rate of HF patients in our society, as well as, it will create a knowledge discovery for medical practitioners and researchers to predict the condition of HF patients before their critical situations. The rest of the paper is organized as follows. Section II contains the related study by different researchers. In Section III, we explain our proposed methodology. Section IV contains the results and analysis of different classification models. Finally, conclusion and related work provide the overall summary of this research in Section V. 2016 45th International Conference on Parallel Processing Workshops 2332-5690/16 $31.00 © 2016 IEEE DOI 10.1109/ICPPW.2016.66 427 2016 45th International Conference on Parallel Processing Workshops 2332-5690/16 $31.00 © 2016 IEEE DOI 10.1109/ICPPW.2016.66 426 2016 45th International Conference on Parallel Processing Workshops 2332-5690/16 $31.00 © 2016 IEEE DOI 10.1109/ICPPW.2016.66 426 2016 45th International Conference on Parallel Processing Workshops 2332-5690/16 $31.00 © 2016 IEEE DOI 10.1109/ICPPW.2016.66 426