International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 01 | Jan 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 750
AN EFFICIENT APPROACHES FOR CLASSIFYING AND PREDICTING
HEART DISEASE USING MACHINE LEARNING TECHNIQUES
M. ELAMATHI¹, C. USHA NANDHINI²
1
Research scholar, M.Phil. Computer Science, Vellalar College for Women, Erode12.
2
Assistant Professor, Department of Computer Applications, Vellalar College for Women, Tamilnadu, India.
---------------------------------------------------------------------***--------------------------------------------------------------------- -
Abstract - Data mining techniques have been widely used in
medical field for prediction and diagnosis of various diseases.
One of most important application of such systems is
diagnoses of Heart Diseases. Today data is collected in
tremendous amount where the human is in need to dependent
on machine. In recent years, heart disorders have excessively
increased and heart diseases are becoming one of the most
fatal diseases in several countries. Most of the dataset often
suffers from outliers which reduces the accuracy in
classification. The outliers are defined in terms of missing
values, incorrect or irrelevant data, and inappropriate value of
dataset. Data Transformation is another important
preprocessing method which is the process of transforming
data into forms appropriate for mining by performing
summary or aggregation operations and Filter methods as
Remove Redundant Features using correlation and one of the
Wrapper methods as Recursive Feature Elimination are
applied. That process, handling missing values is carried out
by “remove with values” and class mean imputation methods.
Classification methods such as KNN, Random Forest & Naïve
Bayes are applied to original data sets as well as on datasets
with feature selection methods. All these processes are applied
on three different Heart Disease Datasets to analyses the
performance of effect of preprocessing in terms of accuracy
rate.
Key Words: Classification, SVM, Naive Bayes, SVM, Random
Forest. K-nearest neighbor.
1. INTRODUCTION
DATA MINING
Data mining is the process of automatically extracting
knowledgeable information from huge amounts of data. It
has become increasingly important as real life data
enormously increasing [3]. Heart disease prediction system
can assist medical professionals in predicting state of heart,
based on the clinical data of patients fed into the system.
There are many tools available which use prediction
algorithms but they have some flaws. Most of the tools
cannot handle big data. There are many hospitals and
healthcare industries which collect huge amounts of patient
data which becomes difficult to handle with currently
existing systems [1]. Machine learning algorithm plays a vital
role in analyzing and deriving hidden knowledge and
information from these data sets. It improves accuracy and
speed.
HEART DISEASE
Heart disease is the most common cause of death for
sexes hare are some statistics demonstrating the scale of
heart disease in the U.S. there are two main lines of
treatment for heart disease. Initially, a person can attempt to
the treat the heart condition using medication. If these do
not have the desired effect surgical option are available to
help correct the issue.
SYMPTOMS
Symptoms for a heart Attack may include:
Chest pain or discomfort a sensation of pressure,
tightness or squeezing in the centre of your chest
Feeling lightheaded or dizzy
Sweating
Fatigue and coughing or wheezing
An overwhelming sense of anxiety
The pain often starts in the chest and then moves
towards the arms, especially in the left side.
DATA SETS
In this work experiments are performed on heart
disease datasets collected from the UCI Machine Learning
Repository. It currently maintains 394 data sets, instances
with 14 attributes, those names are age, sex, cp, trestbps,
choi, fbs, restesg, thalach, exang, oldpeak, slop, ca, thal, num
are used as a service to the machine learning community.
Heart Disease Data Set has 4 data bases namely Cleveland,
Hungary, Switzerland and the VA Long Beach.
2.1 LITERATURE REVIEW
Abiraami T et.al [2018] analyzes the performance
for Diabetic heart disease dataset using various machine
learning classification algorithms such as Support Vector
Machine(SVM), Decision Tree(J48), Naïve Bayes (NB) with
bagging technique. The efficiency of Classification algorithms
is based on the performance, accuracy, precision, specificity
and sensitivity. All tests are performed in the weka tool and