COMPUTER CORNER TIBS 22 - DECEMBER1997 PDBsum: a Web-based database of summaries and analyses of all PDB structures There are currently over 6000 three- dimensional structures of biological macromolecules - primarily proteins and nucleic acids - in the Brookhaven Protein Data Bank (PDB)L This num- ber is doubling every two years and hence will be over 12 000 by the end of the millennium. This steadily increasing volume of data requires some quick and simple means of access and organization. Of interest are not only the data stored in each PDB file, i.e. the names, sequences and for- mulae of the molecule(s), the authors who solved the structure, literature let erences, experimental details and of PDB code: laaw Aminotransferase Structure: Aspartate aminetransferase wild type complex with pyridaxal.5'.phosphate Source: (Escherichia colO Resolution: 2.40 A, R-factor: 0.208 Authors: S.C.Almo, D.L.Smith, A.T.Danishefsky, D.Ringe Date: 13-Jul-93 Enzyme Classification number: E~C~2_.6.LI Further information: ~ (including references), ~ Browser and ¢oords, ~ entry, ~ and ~ classification, ~ summary, PROMOT~LEanalyses. SWlSS-PROT entry: .~T ECOL_[ Rgure 1 Part of a PDBsum entry showingthe header details for PDB code laaw (an aspartate amino- transferase). The thumbnail picture at the top left shows a schematic diagram of the molecules in the entry; in this case, the structure is that of a complex be~veen a single pro- tein chain, shown schematically in purple, with helices represented by cylinders and strands by arrows, and a ligand, shown schematically in a space-filling representation. An expanded version of the thumbnail picture can be obtained by clicking on it, and a VRML version can be viewed with an appropriate viewer via the VRML button. The RasMol button below the picture downloads the atomic coordinates of the complex, enabling the molecules to be viewed in RASMOL (or any other PDB viewer for which the browser has been configured). Below these buttons come various items of information taken from the PDB header file, such as the resolution, R-factor, authors, etc., a,~,~a number of links to further analyses and other Web-based databases. The analyses include the p~otein's CATH classification, a summary PROCHECK analysis, including a Ramachandran plot, and a PROMOTIF overview. The other databases include SWlSS-PROT and our own E.C. ~ PDB database of enzyme structures in the PDB. W course the three-dimensional atomic co- ordinates, but also derived structural data not present in the file. Furthermore, there is much additional information that can be provided, such as detaiRs of the molecule's function and which are its closest reaatives [a terms of sequence, structural similarity and/or tunction. The WorldWide Web (WWW) pro- rides an ideal tool for making such data readily accessible to the scientific com- munity, and indeed there are already a large ~zmber of excellent Web-based servers that provide information about protein and nucleic acid structures. The PDB has its own Web-based search en- gine called PDBBrowse 2, which allows a comDlete text s¢~arch of the erodes in [he ~'DB(http://www.pdb.bnl.gov). Each entry .,hatching a search query provides not just the contents of the PDB structure file itself, but also links to various other Web databases containing additional information. These databases include the Swiss-3Dimage collection s, Entrez's Molecular Modeling Database (MMDB) 4, the SCOP s-7 and CATWstructural classi- fications of proteins, and finally the PDBsum database, which is the subject of this article. There are ofl~er sites that concentrate on specific protein families. The proWeb network9 provides links to dedicated WWW sites specializing in specific protein families (htt p://www.blocks.|hcrc.org/ -steveh/proteln.html); for example, the protein kinase family (http://www.sdsc. edufindx/framehldex.html) and tile c~ protein-coupled receptors (http://receptor. mgh.harvard.edu/GCRDBHOME.html). For nucleic acid structures, the Nu- cleic Acid Database (NDB) l° at Rutgers University, New Jersey, USA provides a comprehensive source of information (http://ndbserver.rutgers.edu). PDkum Here, we describe a Web-based data- base called PDBsum (http://www. biochem.ucl.ac.uk/bsm/pdbsum), which aims to complement the data already available on protein and nucleic acid struc- tures from the various sources described above. The database provides a summary of the molecules in each PDB file (i.e. pro- teins, nucleic acids, ligands, water mol- ecules and metals) together with vari- ous analyses of their structural features. The majority of the structural analyses come from software developed over the past few years by our group at University College London, OK. Also provided are extensive links to most of the existing sites mentioned above, plus others. Copyright © 1997, Elsevier Science Ltd. Allrightsreserved. 0968-0004/97/517.00 Pll:S09Rg-flflfl4fq7~m Ian.7