INVESTIGATION
Conserved Motifs and Prediction of Regulatory
Modules in Caenorhabditis elegans
Guoyan Zhao,*
,1
Nnamdi Ihuegbu,*
,1
Mo Lee,
†
Larry Schriefer,* Ting Wang,* and Gary D. Stormo*
,2
*Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, and
†
Brigham Young
University, Provo, Utah 84602
ABSTRACT Transcriptional regulation, a primary mechanism for controlling the development of multicel-
lular organisms, is carried out by transcription factors (TFs) that recognize and bind to their cognate binding
sites. In Caenorhabditis elegans, our knowledge of which genes are regulated by which TFs, through
binding to specific sites, is still very limited. To expand our knowledge about the C. elegans regulatory
network, we performed a comprehensive analysis of the C. elegans, Caenorhabditis briggsae, and Caeno-
rhabditis remanei genomes to identify regulatory elements that are conserved in all genomes. Our analysis
identified 4959 elements that are significantly conserved across the genomes and that each occur multiple
times within each genome, both hallmarks of functional regulatory sites. Our motifs show significant
matches to known core promoter elements, TF binding sites, splice sites, and poly-A signals as well as
many putative regulatory sites. Many of the motifs are significantly correlated with various types of exper-
imental data, including gene expression patterns, tissue-specific expression patterns, and binding site
location analysis as well as enrichment in specific functional classes of genes. Many can also be significantly
associated with specific TFs. Combinations of motif occurrences allow us to predict the location of cis-
regulatory modules and we show that many of them significantly overlap experimentally determined
enhancers. We provide access to the predicted binding sites, their associated motifs, and the predicted
cis-regulatory modules across the whole genome through a web-accessible database and as tracks for
genome browsers.
KEYWORDS
cis-regulatory
element
cis-regulatory
module
transcription
factor
transcriptional
regulation
Caenorhabditis
elegans
The development of an organism is largely controlled by transcrip-
tional regulation that determines where and when every gene is ex-
pressed. A first step toward the understanding of how genomic DNA
controls the development of an organism is to understand the mech-
anisms that control differential gene expression. Transcriptional reg-
ulation is carried out by transcription factors (TFs) via their binding to
specific DNA sequences. Binding sites of TFs can be represented as
consensus sequences, but position weight matrices (PWMs) provide
a more quantitative description of the specificity of a TF (Stormo
2000). Currently our knowledge of the TFs and their binding sites
is very limited. For example, the human genome has greater than 2000
predicted TFs (Lander et al. 2001), but only a few hundred have
quantitative models of their specificity, primarily determined by
computational tools that have been developed to facilitate the identi-
fication of PWMs for TFs (reviewed in GuhaThakurta 2006). Further-
more, although computational methods can successful identify
binding sites that are bound by a particular TF in vitro, most of the
predicted binding sites are not functional in vivo (Li et al. 2011;
Whittle et al. 2009). In previous studies, authors have shown that
TF binding sites tend to cluster together to direct tissue/temporal-
specific gene expression (Arnone and Davidson 1997; Kirchhamer
et al. 1996). These clusters of binding sites that regulate expression
are referred to as cis-regulatory modules (CRMs). Clustering of TF
binding sites, along with phylogenetic conservation and other meas-
ures of “regulatory potential,” have been widely used in the compu-
tational prediction of CRMs and is a more reliable indicator of in vivo
regulatory function of DNA sequences (Blanchette et al. 2006; Ferretti
et al. 2007; King et al. 2005; Kolbe et al. 2004; Sinha et al. 2006; Taylor
et al. 2006; Wasserman and Sandelin 2004).
Copyright © 2012 Zhao et al.
doi: 10.1534/g3.111.001081
Manuscript received September 8, 2011; accepted for publication February 6, 2012
This is an open-access article distributed under the terms of the Creative
Commons Attribution Unported License (http://creativecommons.org/licenses/
by/3.0/), which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work is properly cited.
Supporting information is available online at http://www.g3journal.org/lookup/
suppl/doi:10.1534/g3.111.001081/-/DC1
1
These authors contributed equally to this work.
2
Corresponding author: Department of Genetics, 660 S. Euclid, Campus Box 8232,
St. Louis, MO 63110. E-mail: stormo@wustl.edu
Volume 2 | April 2012 | 469