Construction and Annotation of a UMLS/SNOMED-based Drug Ontology for Observational Pharmacovigilance Presented at IDAMAP (Intelligent Data Analysis for bioMedicine and Pharmacology), Washington, DC, 2008 Gary H. Merrill, Patrick B. Ryan, Jeffery L. Painter GlaxoSmithKline, Research Triangle Park, North Carolina Abstract The primary goal of the SafetyWorks project has been the development of an integrated set of methodologies enabling the use of large observa- tional data sources in monitoring and assessing drug safety concerns. To support its analytical and exploratory capabilities, SafetyWorks makes use of two large hierarchically structured ontolo- gies – one for medical conditions, and one for drugs. In this paper we focus on the drug on- tology employed in SafetyWorks and on its con- struction and annotation based on the SNOMED CT and RxNorm subsets of the Unified Medical Language System Metathesaurus. The result is a case study illustrating the value of SNOMED and its integration with UMLS and RxNorm in a crit- ical and innovative drug safety application. We expose sufficient details of our methods to enable others to make use of these methods and to en- courage the expanded use of both SNOMED and the UMLS in data exploration and analysis ap- plications, particularly in the area of improving approaches to drug safety. 1 1 Introduction FDA “Guidance for Industry Good Pharmacovigilance Practices and Pharmacoepidemiologic Assessment” [FDA, 2005] describes pharmacovigilance as “all scientific and data gathering activities relating to the detection, assess- ment, and understanding of adverse events.” While a drug is in development, one of the primary sources of safety in- formation is clinical trials, but most trials suffer from insuf- 1 All references to the Unified Medical Language System, the UMLS Metathesaurus, RxNorm, and the UMLS Lexical Tools are accessible through [NLM, 2008]. The SafetyWorks project began in the spring of 2005 and most of the ontology work was developed on the basis of the 2005-2006 releases of the UMLS and its documentation. However, we have continually updated our ontology as new releases have appeared. An extended argument for the use of multiple observational databases in pharmacoepidemiology and how the methods de- scribed here may play a central role in this can be found in [Ryan, 2008]. Some additional details and related work may be found in [Painter et al., 2006], [Ryan et al., 2008], [Ryan and Powell, 2008], [Merrill et al., 2008], and [Painter, 2008]. Figure 1: The SafetyWorks Process ficient sample size and lack of external validity to reliably estimate the risk of any potential safety concerns for the tar- get population. Once a medicine has been approved, spon- taneous adverse event reporting becomes an increasingly important tool for safety evaluation. Case review remains a key component of the ongoing surveillance of medicines, and the application of disproportionality analysis tools on spontaneous adverse event databases has greatly enhanced the signal detection process. Unfortunately, these spon- taneous reporting systems have several limitations that make causal assessments difficult ([Almenoff et al., 2005; Hauben et al., 2005]): voluntary reporting suffers from chronic underreporting and maturation bias, and the un- known nature of underlying populations make true report- ing rates difficult to obtain and use for comparisons. Sev- eral recent safety issues have received significant public attention ([Furberg et al., 2006]), resulting in heightened awareness of the challenges of the current safety review process and increased demand for improved methods for understanding the effects of medicines and ensuring patient safety. SafetyWorks is an integrated system for leveraging ob- servational data in support of the identification and evalua- tion of potential safety concerns of medicines. This system encompasses a data processing procedure that transforms disparate data sources into a common framework that en- ables normalized analyses across sources and the integra- tion of automated methods for observational screening and observational evaluation. Figure 1 illustrates how raw data is extracted from the GlaxoSmithKline Healthcare Infor- mation Factory (a repository of large databases), normal- ized and aggregated with the help of annotated medical condition and drug ontologies constructed from the data