Analyzing Patterns of Numerously Occurring Heart Diseases Using Association Rule Mining K.M. Mehedi Hasan Sonet, Md. Mustafizur Rahman, Pritom Mazumder, Abid Reza, Rashedur M Rahman Department of Electrical and Computer Engineering North South University sonet.hasan@northsouth.edu, rahman.mustafizur@northsouth.edu, pritom169@outlook.com, abid.shetu@northsouth.edu, rashedur.rahman@northsouth.edu Abstract—The use of technology and science in Healthcare has made services available to all the people along with ensuring the best care for the people. Data Mining provides us such useful techniques, which can help the medical practitioners to effectively analyze and discover large amount of data in a more efficient and convenient way as now electronic recording system of data has come into existence. Therefore, millions of data are now available and majority of them would have been remained undiscovered, if the data mining techniques were not introduced. In our work, an association based rule mining technique has been used to identify such hidden patterns of the most commonly occurring heart diseases namely Unstable Angina(UA), Myocardial Infarction(MI), Coronary Heart Disease(CHD) etc. among Bangladeshi people and unravelling the hidden information by analyzing the results. Basically, other researchers in this field used the classification and clustering methods of data mining by which they could predict the chance of occurring heart diseases and clustered them to identify the dependency of one attribute to another. The trends or patterns for heart diseases may vary depending on sex, age, socioeconomic condition, demographic regions and so on. The objective of our work is to find out those hidden trends or patterns. Therefore, we have chosen association rule mining technique to find those patterns or trends among patients depending on their age, sex, regions and socioeconomic condition. Keywords-Data Mining; ARM (Association Rule Mining); Heart Diseases; Hidden Patterns; Knowledge Discovery I. INTRODUCTION Heart diseases are still one of the leading causes of deaths around the world. No doubt, it has raised concerns over the medical practitioners to the fact that it is surpassing the mortality rate of the NCDs (Non Communicable Diseases). The reason behind its increment is that people are embracing different unhealthy food habits, uncontrolled lifestyle, smoking, excessive alcohol intake etc. In earlier days, doctors were relying on the diagnostic report of the patient to identify a certain disease. Nevertheless, now only identifying such diseases does not serve the purpose of preventing it in the long run. Hence, the data mining techniques come into emergence, lots of undiscovered patient data are now analyzed for knowledge discovery purposes, which were previously unknown. Thus, it helps the doctors to discover more by analyzing the hidden relationship among the attributes, which were unrevealed before. As we already know the specific heart disease names for some determining attribute values by analyzing our dataset and therefore we did not go for the predictive task, rather we opted to get to know the patterns for those heart diseases. Some patterns were common in some of the diseases and some diseases were the reasons behind influencing other type of heart diseases. In our work, we have tried to discover all the hidden patterns that are related in most commonly occurring heart diseases among Bangladeshi people and to delineate the reason behind their occurrence among the people by taking all the considerable facts like age, sex, socioeconomic conditions and by using ARM (Association Rule Mining) technique. Therefore, necessary actions or measures can be taken for better confrontation of the diseases and raising awareness among people. The remaining of the paper is divided into 6 sections. Section II discusses about the various data mining techniques, which were applied in this area as related works, Section III, shows the proposed methodology that we have used in our work and the discussion about the preprocessing and post processing of the data. Section IV discusses about our findings and experimental results analysis which are in the form of patterns and finally, we have our conclusion in Section V and future recommendations in Section VI. II. RELATED WORKS The authors in [1] used unsupervised learning techniques for clustering cardiac patient records. Different clustering algorithms like K-means, DBSCAN and K-Medoids are used to verify the performance of the results. Four of the attributes BMI, Age, Report Category, LVEF, LV- Myocardium were used for clustering. Finally, they were successful in identifying different relationship among the attributes and one of them was, if age is greater than 80 then whatever BMI he or she has, all of them are at high risk in terms of report category. In [2], K-Nearest Neighbor based classification technique is used which also includes Genetic algorithm to classify more accurately. They have used genetic search to prune the redundant attributes and attributes are ranked according to it. 978-1-5386-0664-3/17/$31.00 ©2017 IEEE The Twelfth International Conference on Digital Information Management (ICDIM 2017) September 12- 14, 2017, Kyushu University, Fukuoka, Japan. 33