1 Exploring Gene Ontology Annotations with OWL Simon Jupp 1* , Robert Stevens 1 and Robert Hoehndorf 2 1 School of Computer Science, University of Manchester, UK. 2 Department of Genetics, University of Cambridge, Cambridge, UK ABSTRACT Motivation: Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species compari- sons of genes possible, along with a wide range of other activities. Tools, such as AmiGO, allow exploration of genes based on their GO annotations. This human driven explora- tion and querying of GO is obviously useful, but by taking advantage of the ontological representation we can use the- se annotations to create a rich polyhierarchy of proteins for enhanced querying. This also opens up possibilities for ex- ploring GOA for redundancies and defects in annotations. To do this we have created a set of OWL classes for mouse GOA genes. Each gene is represented as a class, with the appropriate relationships to the GO aspects with which it has been annotated. We then use defined classes to query these protein classes and to build a complex hierarchy. This standard use of OWL affords a rich interaction with GO an- notations to give a fine partitioning of the proteins in the on- tology. 1 INTRODUCTION The creation of the Gene Ontology (GO) (Harris 2004) has had a major impact on the description and communication of the major functionalities of gene products for many spe- cies. GO has some 24,000 terms for annotating gene prod- ucts and is used in around 40 species databases and in cross species databases such as Uniprot and Interpro (Camon 2004). It is widely used for querying such databases, mak- ing cross species comparison or in data analyses, such as over-expression analysis in microarray data (Baehrecke 2004). The GO is mainly used as a controlled vocabulary to ensure genes are consistently annotated using standard terminology across many data resources; this alone offers many benefits for data integration and analysis. GO is, however, much more than a vocabulary; it also provides additional infor- mation about how these GO terms are related to each other. These relationships have a well-defined semantics that bring added value to the GO. For example, the hierarchical rela- tionships allow for all kinds of a particular term to be re- trieved, as well as those with an annotation of the term it- self. These and other relationships provide support for navi- * To whom correspondence should be addressed. gation, as well as making explicit the relationship between the entities being described. The AmiGO browser (Carbon 2009) (see also DynGO (Liu 2005), QuickGO (Binns 2009)) provides such an interface and exploits the hierarchical structure of the gene ontology to support query expansion. For example, when searching AmiGO for receptor activity genes, the results returned also include genes involved in GPCR activity because GPCR activity is a subclass of receptor activity. This hierarchical structure is also useful for data mining tasks (Pavlidis 2004). Enrichment analysis is a common technique used in the analysis of high-throughput gene expression data; sets of interesting genes can be grouped or clustered based on common GO annotations (See http://www.geneontology.org/GO.tools.shtml for more GO tools). Whilst highly useful, many of these tools fail to exploit the full potential of the GO’s representation for reasoning and querying over gene annotations. Most of the tools that were investigated do not facilitate rich querying that takes into account the semantics of the GO. For example, it was diffi- cult to ask for all proteins that are located in a membrane or part of a membrane, that are receptor proteins involved in a metabolic process. To answer such a query correctly some form of reasoning over the ontology is required. The ability to perform such rich queries would enable more precise and flexible exploration of the GO annotations. The Web Ontology Language (OWL) 1 and the Open Bio- medical Ontology (OBO) 2 format have a strict semantics that makes it possible to use automated reasoners to help build and use knowledge captured in an ontology. In order to explore the potential of reasoning over the GO annota- tions we need to describe the relationships between the genes and their annotation within a framework that can also exploit the semantics encoded into the GO. Our approach uses the Web Ontology Language, for which a mapping from OBO exists, to represent both the GO annotations alongside the GO to exploit the GO and its annotation for querying and exploration. As an ontology of attributes of gene products, GO itself does not explicitly contain gene products; GO annotations 1 http://www.w3.org/TR/owl-ref/ 2 http://obofoundry.org/