International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 01 | Jan 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 750 AN EFFICIENT APPROACHES FOR CLASSIFYING AND PREDICTING HEART DISEASE USING MACHINE LEARNING TECHNIQUES M. ELAMATHI¹, C. USHA NANDHINI² 1 Research scholar, M.Phil. Computer Science, Vellalar College for Women, Erode12. 2 Assistant Professor, Department of Computer Applications, Vellalar College for Women, Tamilnadu, India. ---------------------------------------------------------------------***--------------------------------------------------------------------- - Abstract - Data mining techniques have been widely used in medical field for prediction and diagnosis of various diseases. One of most important application of such systems is diagnoses of Heart Diseases. Today data is collected in tremendous amount where the human is in need to dependent on machine. In recent years, heart disorders have excessively increased and heart diseases are becoming one of the most fatal diseases in several countries. Most of the dataset often suffers from outliers which reduces the accuracy in classification. The outliers are defined in terms of missing values, incorrect or irrelevant data, and inappropriate value of dataset. Data Transformation is another important preprocessing method which is the process of transforming data into forms appropriate for mining by performing summary or aggregation operations and Filter methods as Remove Redundant Features using correlation and one of the Wrapper methods as Recursive Feature Elimination are applied. That process, handling missing values is carried out by “remove with values” and class mean imputation methods. Classification methods such as KNN, Random Forest & Naïve Bayes are applied to original data sets as well as on datasets with feature selection methods. All these processes are applied on three different Heart Disease Datasets to analyses the performance of effect of preprocessing in terms of accuracy rate. Key Words: Classification, SVM, Naive Bayes, SVM, Random Forest. K-nearest neighbor. 1. INTRODUCTION DATA MINING Data mining is the process of automatically extracting knowledgeable information from huge amounts of data. It has become increasingly important as real life data enormously increasing [3]. Heart disease prediction system can assist medical professionals in predicting state of heart, based on the clinical data of patients fed into the system. There are many tools available which use prediction algorithms but they have some flaws. Most of the tools cannot handle big data. There are many hospitals and healthcare industries which collect huge amounts of patient data which becomes difficult to handle with currently existing systems [1]. Machine learning algorithm plays a vital role in analyzing and deriving hidden knowledge and information from these data sets. It improves accuracy and speed. HEART DISEASE Heart disease is the most common cause of death for sexes hare are some statistics demonstrating the scale of heart disease in the U.S. there are two main lines of treatment for heart disease. Initially, a person can attempt to the treat the heart condition using medication. If these do not have the desired effect surgical option are available to help correct the issue. SYMPTOMS Symptoms for a heart Attack may include:  Chest pain or discomfort a sensation of pressure, tightness or squeezing in the centre of your chest  Feeling lightheaded or dizzy  Sweating  Fatigue and coughing or wheezing  An overwhelming sense of anxiety  The pain often starts in the chest and then moves towards the arms, especially in the left side. DATA SETS In this work experiments are performed on heart disease datasets collected from the UCI Machine Learning Repository. It currently maintains 394 data sets, instances with 14 attributes, those names are age, sex, cp, trestbps, choi, fbs, restesg, thalach, exang, oldpeak, slop, ca, thal, num are used as a service to the machine learning community. Heart Disease Data Set has 4 data bases namely Cleveland, Hungary, Switzerland and the VA Long Beach. 2.1 LITERATURE REVIEW Abiraami T et.al [2018] analyzes the performance for Diabetic heart disease dataset using various machine learning classification algorithms such as Support Vector Machine(SVM), Decision Tree(J48), Naïve Bayes (NB) with bagging technique. The efficiency of Classification algorithms is based on the performance, accuracy, precision, specificity and sensitivity. All tests are performed in the weka tool and