Analyzing Patterns of Numerously Occurring Heart
Diseases Using Association Rule Mining
K.M. Mehedi Hasan Sonet, Md. Mustafizur Rahman, Pritom Mazumder, Abid Reza, Rashedur M Rahman
Department of Electrical and Computer Engineering
North South University
sonet.hasan@northsouth.edu, rahman.mustafizur@northsouth.edu, pritom169@outlook.com, abid.shetu@northsouth.edu,
rashedur.rahman@northsouth.edu
Abstract—The use of technology and science in Healthcare has
made services available to all the people along with ensuring the
best care for the people. Data Mining provides us such useful
techniques, which can help the medical practitioners to
effectively analyze and discover large amount of data in a more
efficient and convenient way as now electronic recording system
of data has come into existence. Therefore, millions of data are
now available and majority of them would have been remained
undiscovered, if the data mining techniques were not
introduced. In our work, an association based rule mining
technique has been used to identify such hidden patterns of the
most commonly occurring heart diseases namely Unstable
Angina(UA), Myocardial Infarction(MI), Coronary Heart
Disease(CHD) etc. among Bangladeshi people and unravelling
the hidden information by analyzing the results. Basically, other
researchers in this field used the classification and clustering
methods of data mining by which they could predict the chance
of occurring heart diseases and clustered them to identify the
dependency of one attribute to another. The trends or patterns
for heart diseases may vary depending on sex, age,
socioeconomic condition, demographic regions and so on. The
objective of our work is to find out those hidden trends or
patterns. Therefore, we have chosen association rule mining
technique to find those patterns or trends among patients
depending on their age, sex, regions and socioeconomic
condition.
Keywords-Data Mining; ARM (Association Rule Mining); Heart
Diseases; Hidden Patterns; Knowledge Discovery
I. INTRODUCTION
Heart diseases are still one of the leading causes of deaths
around the world. No doubt, it has raised concerns over the
medical practitioners to the fact that it is surpassing the
mortality rate of the NCDs (Non Communicable Diseases).
The reason behind its increment is that people are embracing
different unhealthy food habits, uncontrolled lifestyle,
smoking, excessive alcohol intake etc. In earlier days, doctors
were relying on the diagnostic report of the patient to identify
a certain disease. Nevertheless, now only identifying such
diseases does not serve the purpose of preventing it in the
long run. Hence, the data mining techniques come into
emergence, lots of undiscovered patient data are now
analyzed for knowledge discovery purposes, which were
previously unknown. Thus, it helps the doctors to discover
more by analyzing the hidden relationship among the
attributes, which were unrevealed before.
As we already know the specific heart disease names for
some determining attribute values by analyzing our dataset
and therefore we did not go for the predictive task, rather we
opted to get to know the patterns for those heart diseases.
Some patterns were common in some of the diseases and
some diseases were the reasons behind influencing other type
of heart diseases. In our work, we have tried to discover all
the hidden patterns that are related in most commonly
occurring heart diseases among Bangladeshi people and to
delineate the reason behind their occurrence among the
people by taking all the considerable facts like age, sex,
socioeconomic conditions and by using ARM (Association
Rule Mining) technique. Therefore, necessary actions or
measures can be taken for better confrontation of the diseases
and raising awareness among people.
The remaining of the paper is divided into 6 sections. Section
II discusses about the various data mining techniques, which
were applied in this area as related works, Section III, shows
the proposed methodology that we have used in our work and
the discussion about the preprocessing and post processing of
the data. Section IV discusses about our findings and
experimental results analysis which are in the form of
patterns and finally, we have our conclusion in Section V and
future recommendations in Section VI.
II. RELATED WORKS
The authors in [1] used unsupervised learning techniques
for clustering cardiac patient records. Different clustering
algorithms like K-means, DBSCAN and K-Medoids are used
to verify the performance of the results. Four of the attributes
BMI, Age, Report Category, LVEF, LV- Myocardium were
used for clustering. Finally, they were successful in
identifying different relationship among the attributes and
one of them was, if age is greater than 80 then whatever BMI
he or she has, all of them are at high risk in terms of report
category.
In [2], K-Nearest Neighbor based classification technique
is used which also includes Genetic algorithm to classify
more accurately. They have used genetic search to prune the
redundant attributes and attributes are ranked according to it.
978-1-5386-0664-3/17/$31.00 ©2017 IEEE
The Twelfth International Conference on Digital Information Management (ICDIM 2017)
September 12- 14, 2017, Kyushu University, Fukuoka, Japan.
33