Extraction of Electronic Health Record Data in a Hospital Setting: Comparison of Automatic and Semi-Automatic Methods Using Anti-TNF Therapy as Model Thomas Cars 1,2 , Bjorn Wettermark 1,3,4 , Rickard E. Malmstrom 5 , Gunnar Ekeving 6 , Bo Vikstrom 7 , Ulf Bergman 3,4 , Martin Neovius 8 , Bo Ringertz 9 and Lars L. Gustafsson 4 1 Public Healthcare Services Committee Administration, Stockholm County Council, Stockholm, Sweden, 2 Department of Medical Sciences, Uppsala University, Uppsala, Sweden, 3 Centre for Pharmacoepidemiology, Department of Medicine, Karolinska Institutet, Stockholm, Sweden, 4 Division of Clinical Pharmacology, Department of Laboratory Medicine, Karolinska Institutet at Karolinska University Hospital, Stockholm, Sweden, 5 Division of Clinical Pharmacology, Department of Medicine, Karolinska Institutet at Karolinska University Hospital Solna, Stockholm, Sweden, 6 Department of IT Management, Karolinska University Hospital, Stockholm, Sweden, 7 TakeCare Cooperation Centre, Karolinska University Hospital, Stockholm, Sweden, 8 Clinical Epidemiology Unit, Department of Medicine, Karolinska Institutet, Stockholm, Sweden and 9 Division of Rheumatology, Department of Medicine, Karolinska Institutet at Karolinska University Hospital Solna, Stockholm, Sweden (Received 10 November 2012; Accepted 21 January 2013) Abstract: There is limited experience and methods for extractions of drug therapy data from electronic health records (EHR) in the hospital setting. We have therefore developed and evaluated completeness and consistency of an automatic versus a semi- automatic extraction procedure applied on prescribing and administration of the TNF inhibitor infliximab using a hospital EHR system in Karolinska University Hospital, Sweden. Using two different extraction methods (automatic and semi-automatic), all administered infusions of infliximab between 2007 and 2010 were extracted from a database linked to the EHR system. Extracted data included encrypted personal identity number (PIN), date of birth, sex, time of prescription/administration, health- care units, prescribed/administered dose and time of admission/discharge. The primary diagnosis (ICD-10) for the treatment with infliximab was extracted by linking infliximab infusions to their corresponding treatment episode. A total of 13,590 infusions of infliximab were administered during the period of 2007 to 2010. Of those were 13,531 (99.6%) possible to link to a correspond- ing treatment episode, and a primary diagnosis was extracted for 13,530 infusions. Information on encrypted PIN, date of birth, time of prescription/administration, time of admission/discharge and healthcare unit was complete. Information about sex was missing in one patient only. Calculable information about dosage was extracted for 13,300 (98.3%) of all linked infusions. This methodological study showed the potential to extract drug therapy data in a hospital setting. The semi-automatic procedure pro- duced an almost complete pattern of demographics, diagnoses and dosages for the treatment with infliximab. Prior to marketing, the initial efficacy and safety of drug ther- apies are evaluated using strict protocols and randomized con- trolled trials (RCTs). Although the double-blind RCT is the most reliable design for causal inference, it is usually con- ducted in selected populations different from most patients treated with the drug in clinical practice [1]. Consequently, studies of effectiveness, safety, utilization and cost-effective- ness are needed to bridge the knowledge gap between efficacy and effectiveness of drug therapy [2]. Such studies are pres- ently given high priority by many healthcare authorities [3, 4]. The development of claims databases and health data regis- ters has facilitated quality assessment and research on drug utilization [59]. However, most of these databases only cover prescribed drugs in ambulatory care. For drugs prescribed in hospitals, structured data on clinical outcome and utilization are still lacking in many countries. Specific disease-based quality registries have therefore been established in some countries [1012]. Unfortunately, these registers do not cover all diseases and coverage may be incomplete due to voluntary participation. Because the majority of new and often expensive drug therapies are introduced in hospitals, there is a need for reliable and rapid methods to monitor the effectiveness, safety and utilization of drug therapy using clinical data in individual patients. The introduction of electronic health records (EHR) with integrated prescribing modules and other modules for clinical parameters such as diagnosis, vital signs, laboratory and X-ray data and daily clinical notes offers possibilities to study drug utilization in hospitals [2]. In addition, these clinical parame- ters may be linked to patient-specific data in other registers such as data on socio-economic variables creating databases to study the rational use of medicines. Because the number of databases containing healthcare and patient data has increased rapidly during the past decades, the challenge is now to extract knowledge from these in an effi- cient way. Traditionally, data mining was a manual process but because the amount of data has grown exponentially, this process could no longer be maintained manually. Knowledge Author for correspondence: Thomas Cars, Public Healthcare Services Committee Administration, Department of Healthcare Development, Stockholm County Council, Stockholm (fax +46 8 58581070, e-mail thomas.cars@sll.se). © 2013 The Authors Basic & Clinical Pharmacology & Toxicology © 2013 Nordic Pharmacological Society Basic & Clinical Pharmacology & Toxicology Doi: 10.1111/bcpt.12055