Is the inter-patient coincidence of a subclinical
disorder related to EHR similarity?
Lawrence W.C. Chan, Iris F.F. Benzie
Department of Health Technology and
Informatics
Hong Kong Polytechnic University
Hong Kong, China
wing.chi.chan@inet.polyu.edu.hk
iris.benzie@inet.polyu.edu.hk
Y. Liu
Department of Mechanical
Engineering
National University of Singapore,
Singapore
mpeliuy@nus.edu.sg
C.R. Shyu
Informatics Institute
University of Missouri
Columbia, MO 65211-2060, USA
ShyuC@missouri.edu
Abstract—Electronic Health Record (EHR) provide clinical
evidence for identifying subclinical diseases and supporting
decisions on early intervention. Simple string matching cannot
link up the conceptually similar but verbally different clinical
terms in patient records, limiting the usefulness of EHR. A novel
ontological similarity matching approach supported by the
Systematized Nomenclature of Medicine Clinical Terms
(SNOMED-CT) is proposed in this paper. The disease terms of a
patient record are transformed into a vector space so that each
patient record can be characterized by a feature vector. The
similarity between the new record and an existing database
record was quantified by a kernel function of their feature
vectors. The matches are ranked by their similarity scores. To
evaluate the proposed matching approach, medical history and
carotid ultrasonic imaging finding were collected from 47
subjects in Hong Kong. The dataset formed 1081 pairs of patient
records and the ROC analysis was used to evaluate and compare
the accuracy of the ontological similarity matching and the
simple string matching against the presence or absence of carotid
plaques identified in ultrasound examination. It was found that
the simple string matching randomly rated the record pairs but
the ontological similarity matching provided non-random rating.
Keywords-SNOMED; similarity; clinical decision support;
Electronic Health Record
I. INTRODUCTION
ElectroNIC Health Record (EHR) system is comprised of
computer software and hardware components for providing the
archiving and communications of patient-centered clinical
information throughout the episodes of the care of each patient.
In Hong Kong, the EHR system of the Hong Kong Hospital
Authority (HKHA) is one of the world’s largest integrated
longitudinal EHR systems [1]. The general use of EHR focuses
on the longitudinal study of the clinical history of the
individual patient only.
A. Clinical Decision Support
To support the health care professionals to make clinical
decisions and maintain quality of care, clinically meaningful
search for the similar patient records becomes a important
feature of HER system. Simple string matching has long been
used to search for patient records exactly or partially matching
with the given keywords. However, a large portion of
conceptually match records are most likely missing in the
search results. For example, “Coronary Artery Disease” and
“Myocardial Ischemia” are closely related in medical concept
but the simple string matching cannot link these two disease
terms and also the patient records containing them. To address
this issue, medical ontology could be considered and
incorporated into the search algorithm.
B. Medical Ontology
Systematized Nomenclature of Medicine Clinical Terms
(SNOMED-CT) has been widely adopted as a standard for
formulating medical concepts. Over 361800 unique concepts
with 975000 descriptions have been covered in SNOMED-CT
as of 2004 [2, 3]. SNOMED-CT defines semantic relationships
including an extensive “is-a” and “inverse-is-a” structures.
Through these defined relationships, the relative closeness
between two concepts in a record is measured by “edge
counting” the semantic distance along the connecting path in
the ontological hierarchy [2, 4-7]. The edge counting method
has been applied to PubMed document clustering and the
performance was comparable to the alternative methods [7],
such as information content based measure associating the
probabilities with concepts in the ontology [5, 8].
C. EHR Similarity
Based on a well-established ontology, the vector (space)
model offers simple parallel evaluation of similarity between
queries and documents through the construction of the feature
vectors, in which every term at a particular level of the
considered ontology is weighted with a real number in [0,1].
The weight is zero if feature term or its descendent is not
present in the query/documents, otherwise, with positive
number reflecting the relative importance in the
query/documents [9, 10]. The similarity between the given
query and every document can be calculated and the search
algorithm could return a list of documents above the similarity
threshold. When the query and document are two patient
records, such EHR similarity is referred to as ontological
similarity in this paper.
Simple string matching is a common search algorithm
bundled with many EHR systems. A list of hits will be
generated according to the provided search keys. The hits
2011 IEEE 13th International Conference on e-Health Networking, Applications and Services
978-1-61284-696-5/11/$26.00 ©2011 IEEE 177