International Journal of Computer Applications (0975 – 8887) Volume 108 – No 15, December 2014 19 Extracting Diagnosis Patterns in Electronic Medical Records using Association Rule Mining Stephen M. Kang’ethe School of Computing and Informatics, University of Nairobi P. O. Box 30197 – 00100 Nairobi, Kenya Peter W. Wagacha School of Computing and Informatics, University of Nairobi P. O. Box 30197 – 00100 Nairobi, Kenya ABSTRACT Data mining technologies have been used extensively in the commercial retail sectors to extract data from their “big data” warehouses. In healthcare, data mining has been used as well in various aspects which we explore. The voluminous amounts of data generated by medical systems form a good basis for discovery of interesting patterns that may aid decision making and saving of lives not to mention reduction of costs in research work and possibly reduced morbidity prevalence. It is from this that we set out to implement a concept using association rule mining technology to find out any possible diagnostic associations that may have arisen in patients’ medical records spanning across multiple contacts of care. The dataset was obtained from Practice Fusion’s open research data that contained over 98,000 patient clinic visits from all American states. Using an implementation of the classical apriori algorithm, we were able to mine for patterns arising from medical diagnosis data. The diagnosis data was based on ICD-9 coding and this helped limit the set of possible diagnostic groups for the analysis. We then subjected the results to domain expert opinion. The panel of experts validated some of the most common associations that had a minimum confidence level of between 56-76% with a concurrence rate of 90% whereas others elicited debate amongst the medical practitioners. The results of our research showed that association rule mining can not only be used to confirm what is already known from health data in form of comorbidity patterns, but also generate some very interesting disease diagnosis associations that can provide a good starting point and room for further exploration through studies by medical researchers to explain the patterns that are seemingly unknown or peculiar in the concerned populations. Keywords Medical Diagnosis Patterns, Electronic Medical Records, Health Informatics, Association Rule Mining, Apriori. 1. INTRODUCTION The health sector worldwide has been involved in automation of medical records worldwide. Medical practitioners have had to learn new ways of capturing their findings and treatment plans on their patients after having had years of the same on paper. Different health institutions who have adopted Electronic Medical Record systems (EMR) have done it in their own ways before owing to the lack of standardization of such implementations in the years past. In recent times however, world governing institutions like WHO and ISO have embraced the advent of Health Information Systems (HIS) and spearheaded the development of standards that were hitherto unavailable to implementers of health systems. These standards make it easy not only to capture and share data across multiple and seemingly disparate implementations, but to also query, analyze and extract useful statistics from data entered in the same systems. The need to have EMR systems has been influenced by some factors including complex medical data, the influx of patients and the need to have proper recording of health data. When EMR systems are well developed, they are likely to positively impact the quality and reliability of health data, as well as standardized reporting[1]. The standards that will be of particular interest in our research are the International Classification of Diseases (ICD) standards, (both ICD-9 and ICD-10) and HL7 health information interchange standards. In their work, Fast algorithms for mining association rules in large databases, [2], the authors presented an algorithm, known as Apriori, for discovering association rules within large, primarily transactional, sales databases. This algorithm was a development of previously known algorithms for itemset mining and association rules discovery. We have a brief look at how this algorithm works and its known uses in the commercial, particularly retail sales databases, for which the authors admit the algorithm was originally conceived. We will also explore the benefits accrued by using this algorithm over other known algorithms for association rules mining. The availability of standardized medical data creates a large pool of data with a lot of hidden and potentially useful information. Using association rule mining and the apriori algorithm in particular, we seek to unravel the hidden diagnosis patterns that could be present within the data availed by these systems. We also intend to generate and discover strong rules (relationships) that indicate multimorbidity trends from the EMR data with varying measures of interestingness. 2. LITERATURE REVIEW 2.1 International Statistical Classification of Diseases (ICD) International Statistical Classification of Diseases is the standard diagnostic tool for epidemiology, health management and clinical purposes. [3]. It contains standard diagnostic codes that attempt to cover all known morbidity and mortality causes statistics. ICD-10 is the current standard and is a replacement of the widely used ICD-9. The latest version is the 2010 version. ICD 9 has also been in use for a while and is in the process of being replaced by ICD 10. WHO also state that the 11 th revision of the classification (ICD-11) is in place and is set to go on until the year 2017 [4]. ICD-9, codes are three to five digits. The first digit is either numeric or alpha (the letters E or V only) and all other digits are numeric.