C.T. Lim and J.C.H. Goh (Eds.): WCB 2010, IFMBE Proceedings 31, pp. 1595–1598, 2010. www.springerlink.com A Cross-Format Framework for Consistent Information Integration among Molecular Pathways and Ontologies R. Umeton 1,2 , B. Yankama 1 , G. Nicosia 3 , and C.F. Dewey Jr. 1 1 Massachusetts Institute of Technology, Cambridge MA, United States 2 University of Calabria, Rende CS, Italy 3 University of Catania, Catania CT, Italy Abstract— The information coming from biomedical on- tologies and runnable pathways is expanding continuously: research communities keep this process up and their advances are generally shared by means of dedicated resources pub- lished on the web. Having different objectives and different abstraction levels, most of these resources “speak” different languages. Employing an extensible collection of interpreters, we propose a system that abstracts the information from dif- ferent resources and combines them together into a common meta-format. Preserving the resource independence, we pro- vide an alignment service that can be used for multiple pur- poses. Two recent examples are: 1) The new web application Cytosolve uses an embedded version of this system to provide congruous parallel simulation of multiple models; 2) Using the BioModels.net database, a searchable dictionary of equivalent molecular reaction paths was built. Finally, the enriched knowledge can be exported in OWL and queried by semanti- cally-enabled tools such as Protégé. In this approach, we see a valuable tool to integrate and test information originating from different sources, while preserving the independence of the model curation process. Keywords— Algorithms in Bioinformatics; Multi-scale modeling: algorithm development & applications. I. INTRODUCTION The information about molecular processes is expanding continuously and the descriptions are shared in the form of computable pathways. Biomedical ontologies are being created to provide a semantic context for the molecular species that they contain. The current advances in both topics suggest an information integration cycle based on shared knowledge-bases [1]. One can envision ontology resources, such as ChEBI [2] and BioPortal [3], defining the biological context of the pathways in a machine-readable format. It is desirable to inform databases of runnable path- ways, such as the BioModels.net collection [4] and CellML repository [5], with the information contained in the curated molecular ontologies in a manner that can be used easily. However, there is still a large chasm between today's func- tionality and the true ability to use ontological data to in- form molecular pathways. In this paper, we describe our cross-format system called OREMP (Ontology repository and Editor for Molecular Pathways) that aims to do so, and we present practical examples where its usage led to consis- tent information integration, knowledge discovery, and joint simulation of biochemical systems. In the next section the methodologies used in our system are detailed. Section III presents part of the results we achieved, which are discussed in Section IV. Section V concludes the paper and provides some insights in our current researches. II. MATERIALS AND METHODS In order to step beyond simple syntactical translation, we propose a system that merges the information from molecu- lar pathways and curated biological ontologies into ex- tended ontologies using a specific meta-format. This meta- format has been designed to embed the minimalistic and quantitative MIRIAM-compliant [6] information derived from different pathways. Model annotations are preserved and extended with supplemental quantitative data to achieve a common description that can be represented as a single ontology. The structure of this ontology is presented in Table 1. Table 1 Our extended ontology structure Entity Has Annotation type:STRING, uri:STRING, informa- tion:STRING; Species name:STRING, internalId:STRING, initial- Value:REAL, inPathway:PATHWAY, hooks:SET OF ANNOTATIONS; Kinetic reaction internalId:STRING, kinetics:FORMULA, kineticParameters:SET OF PARAMETERS, inPathway:PATHWAY, reactants:SET OF SPECIES, catalysts:SET OF SPECIES, products:SET OF SPECIES, hooks:SET OF ANNOTATIONS; Parameter name:STRING, value:REAL; Pathway fullname:STRING, hooks:SET OF ANNOTATIONS; Operatively, the information (i.e., species, reactions and references to ontologies) coming from heterogeneous re- sources is abstracted into our internal meta-format through these modular computational steps: