The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli Shan Sundararaj, Anchi Guo, Bahram Habibi-Nazhad, Melania Rouani 1 , Paul Stothard, Michael Ellison 1 and David S. Wishart* Faculty of Pharmacy and Pharmaceutical Sciences and 1 Department of Biochemistry, University of Alberta, Edmonton, Alberta T6G 2N8, Canada Received August 15, 2003; Accepted October 13, 2003 ABSTRACT The CyberCell Database (CCDB: http://redpoll. pharmacy.ualberta.ca/CCDB) is a comprehensive, web-accessible database designed to support and coordinate international efforts in modeling an Escherichia coli cell on a computer. The CCDB brings together both observed and derived quantita- tive data from numerous independent sources covering many aspects of the genomic, proteomic and metabolomic character of E.coli (strain K12). The database is self-updating but also supports `community' annotation, and provides an extensive array of viewing, querying and search options including a powerful, easy-to-use relational data extraction system. BACKGROUND Escherichia coli is perhaps the most completely characterized microorganism in existence. The quantity of information known about this Gram-negative bacterium, in combination with its amenability to wet-lab studies and relatively simplistic cellular structure has made it the organism of choice for several international efforts in cellular simulation (1). Project CyberCell (www.projectcybercell.com), which is part of the International E.coli Alliance, is one of these efforts. This large-scale multidisciplinary project involves both the acquisi- tion of new quantiative data about E.coli (strain K12) and the collation or back-®lling of nearly 50 years of pre-existing E.coli information covering all aspects of the genomic, proteomic and metabolomic character of this organism. In an effort to coordinate both the back-®lling and ongoing experimental studies being conducted on E.coli for these simulation efforts, we have built a web-accessible data repository called the CyberCell Database (CCDB). The intent of the CCDB is not to duplicate the many excellent E.coli resources that already exist [such as EcoCyc (2), SwissProt (3), EcoGene (4) and MultiFun (5)], but to facilitate the collection, correction, coordination and storage of the key information needed to simulate E.coli on a computer. Cellular simulation is an intrinsically data-intensive en- deavour, requiring a very broad range of data and data types. This requirement has made it essential to integrate and compile as much available molecular data describing all aspects of E.coli (strain K12) into a single easily accessible resource, including: (i) DNA, RNA and protein sequence data; (ii) gene and protein names, alternative names or abbrevi- ations; (iii) extensive functional or ontological information; (iv) gene position and protein location; (v) macromolecular secondary, tertiary and quaternary structure data; (vi) protein, metabolite and RNA expression levels, copy numbers and concentrations; (vii) protein interaction and protein stoichiometry information; (viii) enzyme rate constants; (ix) metabolite structures, reactions and pathways; (x) lists of cofactors and ligands as well as dozens of other pieces of quantitative molecular data. To compile, con®rm and validate this comprehensive collection of data, several hundred journal articles, more than two dozen different electronic databases and a dozen in-house or web-based programs were searched, accessed, compared, written or run over the course of 2 years. On average, each gene, protein or metabolite entry in the CCDB contains more than 70 separate biomolecular data ®elds, ®lled to varying levels of completeness. As the scope of the CyberCell project expands, the number and completeness of the data ®elds are also expected to expand, with some information being updated continuously as new experimental data becomes available. A complete listing of the current data ®elds as well as the web resources and programs used to assemble the CyberCell database is provided at the CCDB home page. DATABASE DESCRIPTION The CCDB is actually a composite of four browsable databases; (i) the main CyberCell database (CCDB, containing gene and protein information); (ii) the 3D structure database (CC3D, containing information for structural proteomics); (iii) the RNA database (CCRD, containing tRNA and rRNA information) and (iv) the metabolite database (CCMD, containing metabolite information). Each of these is access- ible through hyperlinked buttons located at the top of the CCDB home page. All the CCDB sub-databases are fully web *To whom correspondence should be addressed. Tel: +1 780 492 0383; Fax: +1 780 492 1071; Email: david.wishart@ualberta.ca Nucleic Acids Research, 2004, Vol. 32, Database issue D293±D295 DOI: 10.1093/nar/gkh108 Nucleic Acids Research, Vol. 32, Database issue ã Oxford University Press 2004; all rights reserved