Is the inter-patient coincidence of a subclinical disorder related to EHR similarity? Lawrence W.C. Chan, Iris F.F. Benzie Department of Health Technology and Informatics Hong Kong Polytechnic University Hong Kong, China wing.chi.chan@inet.polyu.edu.hk iris.benzie@inet.polyu.edu.hk Y. Liu Department of Mechanical Engineering National University of Singapore, Singapore mpeliuy@nus.edu.sg C.R. Shyu Informatics Institute University of Missouri Columbia, MO 65211-2060, USA ShyuC@missouri.edu AbstractElectronic Health Record (EHR) provide clinical evidence for identifying subclinical diseases and supporting decisions on early intervention. Simple string matching cannot link up the conceptually similar but verbally different clinical terms in patient records, limiting the usefulness of EHR. A novel ontological similarity matching approach supported by the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) is proposed in this paper. The disease terms of a patient record are transformed into a vector space so that each patient record can be characterized by a feature vector. The similarity between the new record and an existing database record was quantified by a kernel function of their feature vectors. The matches are ranked by their similarity scores. To evaluate the proposed matching approach, medical history and carotid ultrasonic imaging finding were collected from 47 subjects in Hong Kong. The dataset formed 1081 pairs of patient records and the ROC analysis was used to evaluate and compare the accuracy of the ontological similarity matching and the simple string matching against the presence or absence of carotid plaques identified in ultrasound examination. It was found that the simple string matching randomly rated the record pairs but the ontological similarity matching provided non-random rating. Keywords-SNOMED; similarity; clinical decision support; Electronic Health Record I. INTRODUCTION ElectroNIC Health Record (EHR) system is comprised of computer software and hardware components for providing the archiving and communications of patient-centered clinical information throughout the episodes of the care of each patient. In Hong Kong, the EHR system of the Hong Kong Hospital Authority (HKHA) is one of the world’s largest integrated longitudinal EHR systems [1]. The general use of EHR focuses on the longitudinal study of the clinical history of the individual patient only. A. Clinical Decision Support To support the health care professionals to make clinical decisions and maintain quality of care, clinically meaningful search for the similar patient records becomes a important feature of HER system. Simple string matching has long been used to search for patient records exactly or partially matching with the given keywords. However, a large portion of conceptually match records are most likely missing in the search results. For example, “Coronary Artery Disease” and “Myocardial Ischemia” are closely related in medical concept but the simple string matching cannot link these two disease terms and also the patient records containing them. To address this issue, medical ontology could be considered and incorporated into the search algorithm. B. Medical Ontology Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) has been widely adopted as a standard for formulating medical concepts. Over 361800 unique concepts with 975000 descriptions have been covered in SNOMED-CT as of 2004 [2, 3]. SNOMED-CT defines semantic relationships including an extensive “is-a” and “inverse-is-a” structures. Through these defined relationships, the relative closeness between two concepts in a record is measured by “edge counting” the semantic distance along the connecting path in the ontological hierarchy [2, 4-7]. The edge counting method has been applied to PubMed document clustering and the performance was comparable to the alternative methods [7], such as information content based measure associating the probabilities with concepts in the ontology [5, 8]. C. EHR Similarity Based on a well-established ontology, the vector (space) model offers simple parallel evaluation of similarity between queries and documents through the construction of the feature vectors, in which every term at a particular level of the considered ontology is weighted with a real number in [0,1]. The weight is zero if feature term or its descendent is not present in the query/documents, otherwise, with positive number reflecting the relative importance in the query/documents [9, 10]. The similarity between the given query and every document can be calculated and the search algorithm could return a list of documents above the similarity threshold. When the query and document are two patient records, such EHR similarity is referred to as ontological similarity in this paper. Simple string matching is a common search algorithm bundled with many EHR systems. A list of hits will be generated according to the provided search keys. The hits 2011 IEEE 13th International Conference on e-Health Networking, Applications and Services 978-1-61284-696-5/11/$26.00 ©2011 IEEE 177