Abstract -- Although significant progress has been made in the diagnosis and treatment of coronary heart disease (CHD), further investigation is still needed. The objective of this study was to develop a data mining system using association analysis based on the apriori algorithm for the assessment of heart event related risk factors. The events investigated were: myocardial infarction (MI), percutaneous coronary intervention (PCI), and coronary artery bypass graft surgery (CABG). A total of 369 cases were collected from the Paphos CHD Survey, most of them with more than one event. The most important risk factors, as extracted from the association rule analysis were: sex (male), smoking, high density lipoprotein, glucose, family history, and history of hypertension. Most of these risk factors were also extracted by our group in a previous study using the C4.5 decision tree algorithms, and by other investigators. Further investigation with larger data sets is still needed to verify these findings. Keywords: data mining, coronary heart disease, risk of heart event, APRIORI algorithm, risk factors, rule extraction. I. INTRODUCTION In coronary heart disease (CHD), the coronary arteries that supply the heart muscle with oxygen and nutrients become narrowed by atherosclerotic stenotic lesions. This restricts the supply of blood and oxygen to the heart, particularly during exertion when the myocardial metabolic demands are increased [1]. Extensive clinical and statistical studies have identified several factors that increase the risk of coronary heart disease including acute myocardial infarction [2], [3]. The more risk factors one might have, the greater the risk of developing coronary heart disease. Also, the greater the severity of each risk factor, the greater the overall risk. However, this knowledge has not yet helped in the significant reduction of CHD incidence. There are several factors that contribute to the development of a coronary heart event. These risk factors may be classified into two categories, not-modifiable and modifiable [4]. The first category includes factors that cannot be altered by intervention such as age, gender, family history and genetic attributes. Modifiable risk factors are those for which either treatment is available or in which alternations in behavior can reduce the proportion of the population exposed. --------------------------------------------------------------------- M. Karaolis, L. Papaconstandinou and C. Pattichis, are with the Department of Computer Science, University of Cyprus, Nicosia, Cyprus (e-mail: karaolis@acm.org; pattichi@ucy.ac.cy). J.A. Moutiris, is a cardiologist at the Department of Cardiology, Paphos General Hospital, Paphos, Cyprus and coordinator of the Paphos CHD Survey (email: moutiris@ucy.ac.cy). Established, modifiable risk factors for CHD currently include smoking, elevated cholesterol and triglycerides, elevated LDL and low HDL, hypertension, and diabetes [5], [6]. There are a number of other ‘well-established’ risk factors and protective factors that are also modifiable, but there are also a number of other known factors that are not yet considered to be of great importance. The objective of this study was to develop a data mining system for the assessment of CHD related risk factors using the apriori algorithm for extracting rules. A previous study by our group on the same dataset showed that important risk factors could be modified [7]; therefore the risk of CHD of a patient may be reduced through a proper control of these factors as it has already been published by several very important studies, including the EUROASPIRE I, II, and III surveys [8]- [11]. The first and second EUROASPIRE surveys showed high rates of modifiable cardiovascular risk factors in patients with coronary heart disease, and indicated that preventive measures might decrease cardiovascular risk [9], [10]. The third EUROASPIRE survey that investigates the situation in Europe 10 years later (that was done in 2006—07 in 22 countries) to see whether preventive cardiology had improved showed that the major risk factors (smoking, hypertension, and obesity) have not decreased [11]. It is interpreted in the Euroaspire III study that despite a substantial increase in antihypertensive and lipid-lowering drugs, blood pressure management remained unchanged, and almost half of all patients remain above the recommended lipid targets, reflecting a natural reluctance for people to change their lifestyles [11]. Data mining was also employed in several studies, where different algorithms were used for rule extraction and evaluation like the C4.5 decision trees [7], [12] and the Apriori [13], [14] algorithms. The rest of the paper is organized as follows. Section II describes the Material and Methods, Section III the Results and Discussion, and Section IV the Conclusions. II. MATERIAL AND METHODS A. Data Collection Data from 1200 consecutive CHD patients were collected, between the years 2003 – 2006 (300 patients each year) according to a pre-specified protocol, under the supervision of the participating cardiologist (Dr J.A. Moutiris) at the Paphos General Hospital of Cyprus. Patients had at least one of the following criteria on enrollment: history of: myocardial infarction (MI), percutaneous coronary intervention (PCI), or coronary artery bypass graft surgery (CABG). Data for each patient were collected under the following groups (see also Table I): i. Clinical factors: Age, Sex, Smoking (SMBEF), systolic blood pressure (SBP) mmHg, diastolic Association Rule Analysis for the Assessment of the Risk of Coronary Heart Events M. Karaolis, Student Member, IEEE, J.A. Moutiris, FESC, L. Papaconstantinou, C.S. Pattichis, Senior Member, IEEE Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society September 2-6, 2009 Minneapolis, Minnesota, USA