INVESTIGATION Conserved Motifs and Prediction of Regulatory Modules in Caenorhabditis elegans Guoyan Zhao,* ,1 Nnamdi Ihuegbu,* ,1 Mo Lee, Larry Schriefer,* Ting Wang,* and Gary D. Stormo* ,2 *Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, and Brigham Young University, Provo, Utah 84602 ABSTRACT Transcriptional regulation, a primary mechanism for controlling the development of multicel- lular organisms, is carried out by transcription factors (TFs) that recognize and bind to their cognate binding sites. In Caenorhabditis elegans, our knowledge of which genes are regulated by which TFs, through binding to specic sites, is still very limited. To expand our knowledge about the C. elegans regulatory network, we performed a comprehensive analysis of the C. elegans, Caenorhabditis briggsae, and Caeno- rhabditis remanei genomes to identify regulatory elements that are conserved in all genomes. Our analysis identied 4959 elements that are signicantly conserved across the genomes and that each occur multiple times within each genome, both hallmarks of functional regulatory sites. Our motifs show signicant matches to known core promoter elements, TF binding sites, splice sites, and poly-A signals as well as many putative regulatory sites. Many of the motifs are signicantly correlated with various types of exper- imental data, including gene expression patterns, tissue-specic expression patterns, and binding site location analysis as well as enrichment in specic functional classes of genes. Many can also be signicantly associated with specic TFs. Combinations of motif occurrences allow us to predict the location of cis- regulatory modules and we show that many of them signicantly overlap experimentally determined enhancers. We provide access to the predicted binding sites, their associated motifs, and the predicted cis-regulatory modules across the whole genome through a web-accessible database and as tracks for genome browsers. KEYWORDS cis-regulatory element cis-regulatory module transcription factor transcriptional regulation Caenorhabditis elegans The development of an organism is largely controlled by transcrip- tional regulation that determines where and when every gene is ex- pressed. A rst step toward the understanding of how genomic DNA controls the development of an organism is to understand the mech- anisms that control differential gene expression. Transcriptional reg- ulation is carried out by transcription factors (TFs) via their binding to specic DNA sequences. Binding sites of TFs can be represented as consensus sequences, but position weight matrices (PWMs) provide a more quantitative description of the specicity of a TF (Stormo 2000). Currently our knowledge of the TFs and their binding sites is very limited. For example, the human genome has greater than 2000 predicted TFs (Lander et al. 2001), but only a few hundred have quantitative models of their specicity, primarily determined by computational tools that have been developed to facilitate the identi- cation of PWMs for TFs (reviewed in GuhaThakurta 2006). Further- more, although computational methods can successful identify binding sites that are bound by a particular TF in vitro, most of the predicted binding sites are not functional in vivo (Li et al. 2011; Whittle et al. 2009). In previous studies, authors have shown that TF binding sites tend to cluster together to direct tissue/temporal- specic gene expression (Arnone and Davidson 1997; Kirchhamer et al. 1996). These clusters of binding sites that regulate expression are referred to as cis-regulatory modules (CRMs). Clustering of TF binding sites, along with phylogenetic conservation and other meas- ures of regulatory potential,have been widely used in the compu- tational prediction of CRMs and is a more reliable indicator of in vivo regulatory function of DNA sequences (Blanchette et al. 2006; Ferretti et al. 2007; King et al. 2005; Kolbe et al. 2004; Sinha et al. 2006; Taylor et al. 2006; Wasserman and Sandelin 2004). Copyright © 2012 Zhao et al. doi: 10.1534/g3.111.001081 Manuscript received September 8, 2011; accepted for publication February 6, 2012 This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/ by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Supporting information is available online at http://www.g3journal.org/lookup/ suppl/doi:10.1534/g3.111.001081/-/DC1 1 These authors contributed equally to this work. 2 Corresponding author: Department of Genetics, 660 S. Euclid, Campus Box 8232, St. Louis, MO 63110. E-mail: stormo@wustl.edu Volume 2 | April 2012 | 469