1 Ontology representation and ANOVA analysis of vaccine protection investigation Yongqun He 1* , Zuoshuang Xiang 1 , Thomas Todd 1 , Melanie Courtot 2 , Ryan Brinkman 2 , Jie Zheng 3 , Chris Stoeckert 3 , James Malon 4 , Philippe Rocca-Serra 4 , Susanna-Assunta Sansone 4 , Jennifer Fostel 5 , Larisa N. Soldatova 6 , Bjoern Peters 7 , Alan Ruttenberg 8 1 University of Michigan, Ann Arbor, USA; 2 British Columbia Cancer Agency, Vancouver, Canada; 3 Center for Bioinformatics, Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, PA, USA; 4 The European Bioinformatics Institute, Cambridge, UK; 5 Global Health Sector, SRA International, Inc, Durham, NC, USA; 6 Aberystwyth University, Wales, UK; 7 La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA; 8 Science Commons, Cambridge, MA, USA. ABSTRACT Motivation: It is still challenging to represent statistical analysis of experimental data in a semantical framework. As a first step towards this goal, ontological representation of statistical ANOVA analysis is proposed. In a vaccine protection use case, 151 instance data of Brucella vaccine protection investigation were collected from the literature and analyzed using ANOVA. Out of 16 parameters, 10 were found statistically significant in contributing to the protection. The careful study of these instances led to building and validating an OBI-based semantic framework to represent ANOVA formally. An ontology-based representation and statistical analysis of biomedical data allows data consistency checking and data sharing in Semantic Web. Contact: yongqunh@med.umich.edu 1 INTRODUCTION The Ontology for Biomedical Investigations (OBI) is being developed to address the need for a common, integrated ontology for the description of biological and clinical investigations. OBI has been used in experimental investigations in different communities, for example, Bioinvindex (http://www.ebi.ac.uk/bioinvindex ), isa-tools (http://isatab.sourceforge.net/ ), and IEDB (http://www.immuneepitope.org/ ). In our recent study, we used OBI and other ontologies to represent an investigation of vaccine protection against influenza viral infection (Brinkman et al, 2010). The vaccine protection investigation measures how efficient a vaccine or vaccine candidate induces protection against virulent pathogen infection in vivo. While ontology representation of experimental assays in terms of material inputs and data outputs provide a foundation for further data sharing and semantic web studies of specific domains, it is still challenging to apply semantic frameworks to statistical analysis of instance data. OntoDM is a newly proposed ontology of data mining that provides a * To whom correspondence should be addressed. framework and describes entities from the domain of data mining and knowledge discovery. OntoDM is aligned with OBI. The updated OBI has included many statistical terms (e.g., ANOVA, F-test, t-test) and relevant supports that facilitate statistical analysis. The community-based Vaccine Ontology (VO; http://www.violinet.org/vaccineontology/ ) is biomedical ontology that covers the vaccine domain (He et al, 2009). Development of VO has emphasized classification of vaccines and vaccine components, vaccination investigation, and host responses to vaccines. The VO development follows the OBO Foundry principles [Smith et.al., 2007]. VO uses the Basic Formal Ontology (BFO) [Grenon et.al, 2004] as the top-level ontology. OBI is used as another upper level ontology for vaccine investigation. VO uses relations defined by primarily the Relation Ontology (RO) [Smith et.al., 2005] and also by OBI and the Information Artifact Ontology (IAO) ontologies. The close association with these ontologies facilitates data integration and automated reasoning. In this report, we first introduce our ontology representation of the ANOVA statistical analysis, then apply it to investigate the Brucella vaccine protection results curated from the literature. Brucella is an intracellular bacterium that causes brucellosis, the most common zoonotic disease worldwide. In this study, we hypothesized that some experimental variables significantly contribute to Brucella vaccine protection efficacy while others do not. Our study indicates that relying on a semantic framework such as OBI and OntoDM is a useful approach to support biomedical statistical data analyses. 2 METHODS The following methods were applied in this study: Ontology representation of ANOVA Statistical analysis: The analysis of variance (ANOVA) was modeled primarily in OBI. A design pattern was generated. The use case in this study is ANOVA in terms of a linear model.