Enabling Enrichment Analysis with the Human Disease Ontology
Paea LePendu, PhD
*
, Mark A. Musen, MD, PhD, and Nigam H. Shah, MBBS, PhD
Stanford Center for Biomedical Informatics Research 251 Campus Drive Medical School Office
Building, Room X215 Mail Code 5479 Stanford University Stanford, CA 94305-5479, USA
Abstract
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays
result in long lists of “significant genes.” One way to gain insight into the significance of altered
expression levels is to determine whether Gene Ontology (GO) terms associated with a particular
biological process, molecular function, or cellular component are over- or under-represented in the
set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-
set, and is widely used to make sense of the results of high-throughput experiments. Our goal is to
develop and apply general enrichment analysis methods to profile other sets of interest, such as
patient cohorts from the electronic medical record, using a variety of ontologies including
SNOMED CT, MedDRA, RxNorm, and others.
Although it is possible to perform enrichment analysis using ontologies other than the GO, a key
pre-requisite is the availability of a background set of annotations to enable the enrichment
calculation. In the case of the GO, this background set is provided by the Gene Ontology
Annotations. In the current work, we describe: (i) a general method that uses hand-curated GO
annotations as a starting point for creating background datasets for enrichment analysis using other
ontologies; and (ii) a gene–disease background annotation set—that enables disease-based
enrichment—to demonstrate feasibility of our method.
Keywords
Enrichment Analysis; Human Disease; Ontology; Annotation; Information Integration
1. Introduction
One way to gain insight into the significance of a particular set of genes is to determine
whether functional terms that are associated with each gene are over- or under-represented
in the set of genes deemed significant. This process, referred to as enrichment analysis,
profiles a gene-set, and is widely used to make sense of the results of high-throughput
experiments such as geneexpression assays. The canonical example of enrichment analysis
is in the interpretation of a list of differentially expressed genes in some condition. The usual
approach is to perform enrichment analysis with the Gene Ontology (GO). We can
aggregate the annotating GO concepts associated with a particular biological process,
© 2011 Elsevier Inc. All rights reserved
*
Corresponding Author plependu@stanford.edu 650-721-5821 fax 650-725-7944.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our
customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of
the resulting proof before it is published in its final citable form. Please note that during the production process errors may be
discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of Interest The authors declare that there are no conflicts of interest.
NIH Public Access
Author Manuscript
J Biomed Inform. Author manuscript; available in PMC 2012 December 01.
Published in final edited form as:
J Biomed Inform. 2011 December ; 44(Suppl 1): S31–S38. doi:10.1016/j.jbi.2011.04.007.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript