Extraction of Electronic Health Record Data in a Hospital
Setting: Comparison of Automatic and Semi-Automatic Methods
Using Anti-TNF Therapy as Model
Thomas Cars
1,2
, Bj€ orn Wettermark
1,3,4
, Rickard E. Malmstr€ om
5
, Gunnar Ekeving
6
, Bo Vikstr€ om
7
, Ulf Bergman
3,4
, Martin Neovius
8
,
Bo Ringertz
9
and Lars L. Gustafsson
4
1
Public Healthcare Services Committee Administration, Stockholm County Council, Stockholm, Sweden,
2
Department of Medical Sciences,
Uppsala University, Uppsala, Sweden,
3
Centre for Pharmacoepidemiology, Department of Medicine, Karolinska Institutet, Stockholm, Sweden,
4
Division of Clinical Pharmacology, Department of Laboratory Medicine, Karolinska Institutet at Karolinska University Hospital, Stockholm,
Sweden,
5
Division of Clinical Pharmacology, Department of Medicine, Karolinska Institutet at Karolinska University Hospital Solna, Stockholm,
Sweden,
6
Department of IT Management, Karolinska University Hospital, Stockholm, Sweden,
7
TakeCare Cooperation Centre, Karolinska
University Hospital, Stockholm, Sweden,
8
Clinical Epidemiology Unit, Department of Medicine, Karolinska Institutet, Stockholm, Sweden and
9
Division of Rheumatology, Department of Medicine, Karolinska Institutet at Karolinska University Hospital Solna, Stockholm, Sweden
(Received 10 November 2012; Accepted 21 January 2013)
Abstract: There is limited experience and methods for extractions of drug therapy data from electronic health records (EHR) in
the hospital setting. We have therefore developed and evaluated completeness and consistency of an automatic versus a semi-
automatic extraction procedure applied on prescribing and administration of the TNF inhibitor infliximab using a hospital EHR
system in Karolinska University Hospital, Sweden. Using two different extraction methods (automatic and semi-automatic), all
administered infusions of infliximab between 2007 and 2010 were extracted from a database linked to the EHR system.
Extracted data included encrypted personal identity number (PIN), date of birth, sex, time of prescription/administration, health-
care units, prescribed/administered dose and time of admission/discharge. The primary diagnosis (ICD-10) for the treatment with
infliximab was extracted by linking infliximab infusions to their corresponding treatment episode. A total of 13,590 infusions of
infliximab were administered during the period of 2007 to 2010. Of those were 13,531 (99.6%) possible to link to a correspond-
ing treatment episode, and a primary diagnosis was extracted for 13,530 infusions. Information on encrypted PIN, date of birth,
time of prescription/administration, time of admission/discharge and healthcare unit was complete. Information about sex was
missing in one patient only. Calculable information about dosage was extracted for 13,300 (98.3%) of all linked infusions. This
methodological study showed the potential to extract drug therapy data in a hospital setting. The semi-automatic procedure pro-
duced an almost complete pattern of demographics, diagnoses and dosages for the treatment with infliximab.
Prior to marketing, the initial efficacy and safety of drug ther-
apies are evaluated using strict protocols and randomized con-
trolled trials (RCTs). Although the double-blind RCT is the
most reliable design for causal inference, it is usually con-
ducted in selected populations different from most patients
treated with the drug in clinical practice [1]. Consequently,
studies of effectiveness, safety, utilization and cost-effective-
ness are needed to bridge the knowledge gap between efficacy
and effectiveness of drug therapy [2]. Such studies are pres-
ently given high priority by many healthcare authorities [3, 4].
The development of claims databases and health data regis-
ters has facilitated quality assessment and research on drug
utilization [5–9]. However, most of these databases only cover
prescribed drugs in ambulatory care. For drugs prescribed in
hospitals, structured data on clinical outcome and utilization
are still lacking in many countries. Specific disease-based
quality registries have therefore been established in some
countries [10–12]. Unfortunately, these registers do not cover
all diseases and coverage may be incomplete due to voluntary
participation. Because the majority of new and often expensive
drug therapies are introduced in hospitals, there is a need for
reliable and rapid methods to monitor the effectiveness, safety
and utilization of drug therapy using clinical data in individual
patients.
The introduction of electronic health records (EHR) with
integrated prescribing modules and other modules for clinical
parameters such as diagnosis, vital signs, laboratory and X-ray
data and daily clinical notes offers possibilities to study drug
utilization in hospitals [2]. In addition, these clinical parame-
ters may be linked to patient-specific data in other registers
such as data on socio-economic variables creating databases to
study the rational use of medicines.
Because the number of databases containing healthcare and
patient data has increased rapidly during the past decades, the
challenge is now to extract knowledge from these in an effi-
cient way. Traditionally, data mining was a manual process
but because the amount of data has grown exponentially, this
process could no longer be maintained manually. Knowledge
Author for correspondence: Thomas Cars, Public Healthcare Services
Committee Administration, Department of Healthcare Development,
Stockholm County Council, Stockholm (fax +46 8 58581070, e-mail
thomas.cars@sll.se).
© 2013 The Authors
Basic & Clinical Pharmacology & Toxicology © 2013 Nordic Pharmacological Society
Basic & Clinical Pharmacology & Toxicology Doi: 10.1111/bcpt.12055