Use of Genomic Variants in Informatics for Integrating Biology and the Bedside (i2b2) Lori C. Phillips MS 1 , Simon Minovitsky 2 , Igor Ratnere 2 , Inna Dubchak Ph.D. 2,3 , Isaac Kohane MD Ph.D. 4 , Shawn N. Murphy MD Ph.D. 5 1 Partners Healthcare Systems, Charlestown, MA, 2 DOE Joint Genome Institute, Walnut Creek, CA, 3 Lawrence Berkeley National Laboratory, Berkeley, CA, 4 Children’s Hospital, Boston, MA, 5 Massachusetts General Hospital, Boston, MA Abstract An electronic Clinical Research Chart (CRC) has been developed under the NIH Roadmap National Centers for Biomedical Computing (NCBC) Informatics for Integrating Biology and the Bedside (i2b2) effort to organize and integrate clinical data, trials data and genomic data with knowledge annotations. This paper describes a method to classify genomic variants within the CRC. A set of new tools to explore and annotate these variants have been developed and are demonstrated. Finally, the inherent architecture of the CRC permits translational querying and computational analysis of genomic and clinical data. Introduction Informatics for Integrating Biology and the Bedside (i2b2) is one of the sponsored initiatives of the NIH Roadmap NCBC (http://www.ncbc.org). A primary goal of i2b2 is to provide clinical investigators with a cohesive set of software tools necessary to collect and manage clinical research data in the genomics age—a software suite to construct and manage the modern clinical research chart. The i2b2 Hive is a collection of interoperable software objects, or “cells” that communicate through scalable web services. The i2b2 Hive consists of a number of core cells that provide basic services to access the data in the CRC clinical repository and present it in a form for consumption by the researcher. The software architecture allows additional cells to be developed for specific forms of analysis, and then integrated into a software “hive” to form a cohesive whole. Exploration of genomic variants Genomic variant exploration has two contexts within the scope of i2b2. First, the variants have to be named and classified within the Ontology cell such that the user may easily discover the variants they wish to query against the CRC. Secondly, it would be helpful if the workbench could provide some useful information and annotations about the variant. When designing a method to represent genomic variant data for the CRC, we analyzed properties available to us from both the available literature and our genomic lab reporting system (Figure 1). Figure 1. An i2b2 SNP vocabulary term can be created after review of reference data or data collected from the lab as a result of a post-generation sequencer. The variants are named and classified for insertion into the CRC. Existing i2b2 cells permit querying and analysis of the variants within the i2b2 framework. A new i2b2 cell has been created to annotate and compare these variants. Create i2b2 SNP Vocabulary Term Gather SNP data from genomic lab reporting Gather SNPs from Reference data Use vocabulary to query against CRC Use vocabulary to query VISTA annotations