GALEN Based Formal Representation of ICD10 Gergely Héja a , György Surján b , Gergely Lukácsy c , Péter Pallinger a , Miklós Gergely a a Budapest University of Technology and Economics, Department of Measurement and Information Systems b National Institute for Strategic Health Research c Budapest University of Technology and Economics, Department of Computer Science and Information Theory Abstract The authors present a formal representation of ICD10 based on GALEN CRM. The goal of the work is to create a coding support tool for coding clinical diagnoses to ICD10. The formal representation of the first two chapters of ICD10 has been almost completed. The paper presents the main aspects of the modelling, and the experienced problems. The constructed ontology has been converted to OWL, and a test system has been implemented in Prolog to verify the feasibility of the approach. The system successfully identified diseases in medical records from gastrointestinal oncology. The classifier module is still under development. Keywords: Ontology, ICD10, GALEN 1. Introduction Indexing of medical diagnoses is a difficult and error-prone task. Providing assistance to manual coding is an important research area in medical informatics since many decades [1], still unsolved. Computer-assisted coding system can be basically classified into two groups. Statistical systems do not “know” anything about the coding systems and the natural language, they classify the diagnoses based on statistical features of the training samples [2, 3]. Such systems are language-independent and easy to implement, since only well- controlled training samples are required. The usage of thesauri could significantly enhance the performance of such systems [4]. The drawback of this approach is that it can only cope with problems more or less masked by the training sample. Knowledge-intensive systems represent formally both the coding system and the clinical text to be coded. The creation of the knowledge base is a resource intensive task, but the knowledge-based formal representation of medical narratives can support the reuse of information in various ways (clinical decision support, communication between different EPR systems, etc.) When the knowledge base describes both the clinical concepts and those of the coding system, the system can infer the possible codes even in those cases when the clinical expression uses different terms or even different concepts than the code category. This paper presents a knowledge-intensive method for assisting ICD10 [5] coding. Both manual and computer-assisted coding processes may use clinical diagnoses as input information. This is a rational constraint (although it has some drawbacks [6]), because processing of the whole patient record would require a very complex model. In cases where the diagnosis is not specific enough, the user should consult the patient record. Connecting Medical Informatics and Bio-Informatics R. Engelbrecht et al. (Eds.) ENMI, 2005 707 Section 9: Terminologies, Ontologies, Standards and Knowledge Engineering