1 The Age-Phenome Knowledgebase: an example to handling ab- straction and expressiveness in a knowledge domain Nophar Geifman and Eitan Rubin * Department of Microbiology and Immunology, Faculty of Medical Sciences and the National Institute of Biotechnology in the Negev, Ben Gurion University, Israel ABSTRACT Background: It is well recognized that age and health are related, and there exists a great store of published research containing links between age and disease conditions or phenotypes. Better understanding of age- phenotype relationships have the potential to lead to new findings, but these data are currently expressed within the published works in a way that makes their retrieval and integration prohibitively difficult. To address this, a knowledgebase is required which has in its underpinning model, the required level of abstraction and degree of expressiveness to yield useful organisation of information from the literature. Recently, we described the Age-Phenome Knowledgebase (APK), a computational platform for storage and retrieval of information concerning age-related phenotypic patterns. To arrive at a useful tool, a way of employing ontologies and standardized vocabularies was developed. This approach is detailed and evaluated here. Methods and results: The Age-Phenome Knowledgebase contains a range of evidence sources, such as scientific publications and clinical data analysis that contain connections between specific ages or age groups and phenotypes such as diseases. In order to describe these elements of age, disease and other forms of phenotypes, selected ontologies and fixed vo- cabularies were incorporated into the APK. This approach provided two key advantages. Firstly, it provides a standardized, unambiguous method to describe the diseases, phenotypes, ages and other relevant terms for organ- izing the knowledge items. In addition, the approach allows abstraction to higher order concepts. Ages and age groups are described using the ‘Age Ontology’, a simple ontology developed for this purpose and based on the description of age-ranges in the Medical Subject Headings (MeSH). The Disease Ontology (DO) is used in APK to represent diseases while other forms of phenotypes are described by a subset of the Unified Medical Lan- guage System (UMLS) Metathesaurus. Complex searches are enabled in the APK by abstracting over the Age Ontology and the Disease Ontology's hierarchical structures. The selection of ontologies and vocabularies to be used for investigating the representation of diseases and other phenotypes in APK was guided by a decision to develop the APK as an open resource, the use of licensed resources being kept to a minimum. The intention is that the APK can be further developed and refined through community usage. Conclusions: Making integral and extensive use of ontologies and vo- cabularies is shown to allow representation of diseases and age groups in a standard, unambiguous way. Furthermore, the use of ontologies in the APK allows abstraction making it straightforward for researchers to develop and conduct complex queries. Therefore, APK provides an example of how ontologies can be used in rapid development of new knowledge models, and provides practical insight to the factors that determine applicability of ontologies in biomedical research. 1 INTRODUCTION Age plays an important role in medicine and biomedical research. A patient's age may effect the course and progres- sion of a disease (Diamond et al., 1989; Hasenclever & Diehl, 1998), may be an important factor in determining the correct course of treatment (Vecht, 1993), and could have an * To whom correspondence should be addressed: erubin@bgu.ac.il impact on the normal values of various biomedical markers (Fliss et al., 2008; Rubin et al., 2011). The relationships between age and human disease have been extensively in- vestigated over the years. As a result of these investigations, a significant quantity of data exists linking specific ages or age ranges with disease. Until recently, data about age-phenotype associations was not systematically organized and could not be studied methodically. For example, searching for scientific articles describing phenotypic changes reported to occur at a given age was extremely difficult. Such searches would usually result in having to find those few works (if any) that discuss some events or trends in a particular age by scanning a large number of papers. Many efforts have been made to formally represent biomedical knowledge, most successfully in the molecular biology domain (Ashburner et al., 2000). Some of these efforts are concerned with human phenotypes; however, most do not represent the connection of these phe- notypes to age. The Human Phenotype Ontology (HPO) (Robinson et al., 2008), which provides a standardized vo- cabulary of phenotypic abnormalities encountered in human disease, includes a set of terms which allow the description of some age-related information such as age of onset (for example, the term 'Onset in early adulthood') but detailed knowledge linking age to phenotype is not included. Online Mendelian Inheritance in Man (OMIM) (Rashbass, 1995), a major phenotype-related knowledgebase, maintains knowledge about age and the links to disease, yet this knowledge is not structured and includes only a limited range of phenotypes (i.e. genetic disorders). Recently, we reported on the development of the Age- Phenome Knowledgebase (APK) in which knowledge about age-related phenotypic patterns and events can be modeled and stored for retrieval (Geifman & Rubin, 2011). The knowledgebase offers a method for holding a structured representation of literature-derived knowledge about clini- cally-relevant traits and trends which occur at different ages such as disease symptoms and disease propensity. In APK, evidential text fragments are stored and are linked to the specific age (or age range) and phenotype(s) discussed in the text. The nature of the connection between age and phe- notype is represented by five different types of relation- ships: 'Age of Onset', 'Age of Diagnosis', 'Age of Observa-