Pure Appl. Chem., Vol. 74, No. 6, pp. 899–905, 2002.
© 2002 IUPAC
899
Microbial computational genomics of gene
regulation*
Julio Collado-Vides
‡
, Gabriel Moreno-Hagelsieb, and
Arturo Medrano-Soto
Program of Computational Genomics, CIFN-UNAM, Av. Universidad s/n,
Cuernavaca, 62100 Morelos, Mexico
Abstract: Escherichia coli is a free-living bacterium that condensates a large legacy of knowl-
edge as a result of years of experimental work in molecular biology. It represents a point of
departure for analyses and comparisons with the ever-increasing number of finished micro-
bial genomes. For years, we have been gathering knowledge from the literature on transcrip-
tional regulation and operon organization in E. coli K-12, and organizing it in a relational
database, RegulonDB. RegulonDB contains information of 20–25 % of the expected total
sets of regulatory interactions at the level of transcription initiation. We have used this knowl-
edge to generate computational methods to predict the missing sets in the genome of E. coli,
focusing on prediction of promoters, regulatory sites, regulatory proteins, operons, and tran-
scription units. These predictions constitute separate pieces of a single puzzle. By putting
them all together, we shall be able to predict the complete set of regulatory interactions and
transcription unit organization of E. coli. Orthologous genes in other genomes of known co-
regulated sets of genes in E. coli, along with their corresponding predicted operons, and their
predicted transcriptional regulators, shall permit the extension of the previous goal to many
more microbial genomes.
INTRODUCTION
The current accumulated knowledge of gene regulation and gene function in E. coli K-12 is unparal-
leled by about any other model organism. The accumulated knowledge of molecular biology in E. coli,
summarized in the E. coli and Salmonella books of Neidhardt and collaborators [1,2], illustrates the
legacy that E. coli represents.
Our laboratory has been devoted for years to a systematic search in the literature of known mech-
anisms of regulation of transcription initiation as well as operon organization in E. coli. This informa-
tion is contained in RegulonDB, a relational database with known and computationally predicted ele-
ments, that is available at <http://www.cifn.unam.mx/Computational_Biology/regulondb/>.
The knowledge gathered so far gives us 528 known transcription units, 624 mapped promoters,
close to 100 terminators [3], 165 DNA-binding transcriptional regulators [4], as well as 2128 genes with
a known functional class assigned [5]. This experimentally supported knowledge is complemented with
overall comprehensive computational predictions estimating a total of 700 operons [6] and a total of 314
transcriptional regulators [4].
Furthermore, we have also worked in the prediction of operator sites for the binding of transcrip-
tional regulators [7]. Sequence and positional analyses of known sigma 70 promoters have permitted us
*Plenary lecture presented at the International Conference on Bioinformatics 2002: North–South Networking, Bangkok,
Thailand, 6–8 February 2002. Other presentations are presented in this issue, pp. 881–914.
‡
Corresponding author: E-mail: collado@cifn.unam.mx