Knowledge-Based Systems 210 (2020) 106455 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys A clinical coding recommender system Mani Suleiman a,b,* , Haydar Demirhan a , Leanne Boyd c,d , Federico Girosi e,f , Vural Aksakalli a a School of Science, Mathematical Sciences, RMIT University, Australia b Rozetta Institute (formerly Capital Markets Cooperative Research Centre, CMCRC), Australia c Cabrini Institute, Australia d Eastern Health, Victoria, Australia e Western Sydney University, Australia f Digital Health CRC, Australia article info Article history: Received 24 March 2020 Received in revised form 3 September 2020 Accepted 4 September 2020 Available online 17 September 2020 Keywords: Health informatics Bayesian networks Clinical coding Artificial intelligence Data mining Recommender systems abstract Clinical coding of hospital admissions can erroneously omit diagnosis and procedure codes. A conse- quence of these omissions is that the condition and treatment of the patient are not fully captured by the entered codes, which can then also impact hospital revenue. One way to prevent these errors is through a real-time recommender system which suggests the addition of codes at the point of coding when it appears they have been omitted. Association analysis uncovers patterns between codes, forming a basis for coding recommendations. Combining association analysis with manual expert validation produces more useful recommendations (we refer to this as the expert validated list), but is labour-intensive. In this study, we propose an approach using Bayesian Networks to determine the conditional relationships between codes. Performance is evaluated using a testing strategy which simulates errors through the random removal of codes from episodes of patient care and counts how many of the removed codes are recommended to coders by each recommender. Performance is also based on how many recommended codes were not removed (superfluous recommendations) which we seek to minimise. We develop a recommender system which generates 96% of the number of correct recommendations produced by the expert validated list, while having 68% fewer superfluous recommendations. Our proposed methodology provides a high performance recommender while reducing dependence on labour-intensive effort by clinical coding experts. © 2020 Elsevier B.V. All rights reserved. 1. Introduction 1.1. Background Clinical coding staff working for Health Services are responsi- ble for coding episodes of patient care. This involves the trans- mission of information from patient medical records into a series of standardised codes which represent diagnoses and procedures. These clinical codes are derived from the International Statistical Classification of Diseases and Related Health Problems, currently in its 10th revision (ICD-10) published by the World Health Organisation [1]. * Corresponding author at: School of Science, Mathematical Sciences, RMIT University, Australia. E-mail addresses: mani.suleiman@rmit.edu.au, manisuleiman.ds@gmail.com (M. Suleiman), haydar.demirhan@rmit.edu.au (H. Demirhan), lboyd@cabrini.com.au, leanne.boyd@easternhealth.org.au (L. Boyd), F.Girosi@westernsydney.edu.au (F. Girosi), vural.aksakalli@rmit.edu.au (V. Aksakalli). Automated coding, also known as Computer-Assisted Coding (CAC), is emerging as a new technology in health information management. CAC processes clinical text from electronic health records (EHRs) and automatically assigns codes. Studies have shown that automated coding is not an error-free process and its performance depends on case complexity [2]. Campbell and Giadresco [3] found that while CAC technology can improve clin- ical coding accuracy, human intervention will still be required, particularly for quality control. Structured health data is still typ- ically encoded via manual coding. Almost all health providers in Australia use manual coding from paper medical records. Hence, our methodology is not designed to perform the role of an au- tomated coding system. We aim to provide a tool to support the coding assignment carried out manually by coding professionals through an analysis of historical patient data. During the coding process, coders sometimes erroneously and unintentionally omit codes. This can result, for example, from incomplete reading of documents in the patient medical records, such as the discharge summary. Errors can occur due to inexperi- ence and/or oversights caused by time pressure. According to an https://doi.org/10.1016/j.knosys.2020.106455 0950-7051/© 2020 Elsevier B.V. All rights reserved.