COMPUTER CORNER
TIBS 22 - DECEMBER1997
PDBsum: a Web-based database
of summaries and analyses
of all PDB structures
There are currently over 6000 three-
dimensional structures of biological
macromolecules - primarily proteins
and nucleic acids - in the Brookhaven
Protein Data Bank (PDB)L This num-
ber is doubling every two years and
hence will be over 12 000 by the end of
the millennium.
This steadily increasing volume of data
requires some quick and simple means
of access and organization. Of interest
are not only the data stored in each PDB
file, i.e. the names, sequences and for-
mulae of the molecule(s), the authors
who solved the structure, literature let
erences, experimental details and of
PDB code: laaw
Aminotransferase
Structure: Aspartate aminetransferase wild type complex with
pyridaxal.5'.phosphate
Source: (Escherichia colO
Resolution: 2.40 A, R-factor: 0.208
Authors: S.C.Almo, D.L.Smith, A.T.Danishefsky, D.Ringe
Date: 13-Jul-93
Enzyme Classification number: E~C~2_.6.LI
Further information: ~ (including references), ~ Browser and
¢oords, ~ entry, ~ and ~ classification, ~ summary,
PROMOT~LEanalyses.
SWlSS-PROT entry: .~T ECOL_[
Rgure 1
Part of a PDBsum entry showingthe header details for PDB code laaw (an aspartate amino-
transferase). The thumbnail picture at the top left shows a schematic diagram of the
molecules in the entry; in this case, the structure is that of a complex be~veen a single pro-
tein chain, shown schematically in purple, with helices represented by cylinders and strands by
arrows, and a ligand, shown schematically in a space-filling representation. An expanded
version of the thumbnail picture can be obtained by clicking on it, and a VRML version can
be viewed with an appropriate viewer via the VRML button. The RasMol button below the
picture downloads the atomic coordinates of the complex, enabling the molecules to be
viewed in RASMOL (or any other PDB viewer for which the browser has been configured).
Below these buttons come various items of information taken from the PDB header file,
such as the resolution, R-factor, authors, etc., a,~,~a number of links to further analyses and
other Web-based databases. The analyses include the p~otein's CATH classification, a
summary PROCHECK analysis, including a Ramachandran plot, and a PROMOTIF overview.
The other databases include SWlSS-PROT and our own E.C. ~ PDB database of enzyme
structures in the PDB.
W
course the three-dimensional atomic co-
ordinates, but also derived structural
data not present in the file. Furthermore,
there is much additional information
that can be provided, such as detaiRs of
the molecule's function and which are
its closest reaatives [a terms of sequence,
structural similarity and/or tunction.
The WorldWide Web (WWW) pro-
rides an ideal tool for making such data
readily accessible to the scientific com-
munity, and indeed there are already a
large ~zmber of excellent Web-based
servers that provide information about
protein and nucleic acid structures. The
PDB has its own Web-based search en-
gine called PDBBrowse 2, which allows a
comDlete text s¢~arch of the erodes in
[he ~'DB(http://www.pdb.bnl.gov). Each
entry .,hatching a search query provides
not just the contents of the PDB structure
file itself, but also links to various other
Web databases containing additional
information. These databases include
the Swiss-3Dimage collection s, Entrez's
Molecular Modeling Database (MMDB) 4,
the SCOP s-7 and CATWstructural classi-
fications of proteins, and finally the
PDBsum database, which is the subject
of this article.
There are ofl~er sites that concentrate
on specific protein families. The proWeb
network9 provides links to dedicated
WWW sites specializing in specific protein
families (htt p://www.blocks.|hcrc.org/
-steveh/proteln.html); for example, the
protein kinase family (http://www.sdsc.
edufindx/framehldex.html) and tile c~
protein-coupled receptors (http://receptor.
mgh.harvard.edu/GCRDBHOME.html).
For nucleic acid structures, the Nu-
cleic Acid Database (NDB) l° at Rutgers
University, New Jersey, USA provides a
comprehensive source of information
(http://ndbserver.rutgers.edu).
PDkum
Here, we describe a Web-based data-
base called PDBsum (http://www.
biochem.ucl.ac.uk/bsm/pdbsum), which
aims to complement the data already
available on protein and nucleic acid struc-
tures from the various sources described
above. The database provides a summary
of the molecules in each PDB file (i.e. pro-
teins, nucleic acids, ligands, water mol-
ecules and metals) together with vari-
ous analyses of their structural features.
The majority of the structural analyses
come from software developed over the
past few years by our group at University
College London, OK. Also provided are
extensive links to most of the existing
sites mentioned above, plus others.
Copyright © 1997, Elsevier Science Ltd. Allrightsreserved. 0968-0004/97/517.00 Pll:S09Rg-flflfl4fq7~m Ian.7