Analysis of a Data Set of Paired Uncomplexed Protein
Structures: New Metrics for Side-Chain Flexibility and
Model Evaluation
Shanrong Zhao, David S. Goodsell, and Arthur J. Olson
*
Department of Molecular Biology, Scripps Research Institute, La Jolla, California
ABSTRACT We compiled and analyzed a data
set of paired protein structures containing proteins
for which multiple high-quality uncomplexed atomic
structures were available in the Protein Data Bank.
Side-chain flexibility was quantified, yielding a set
of residue- and environment-specific confidence lev-
els describing the range of motion around
1
and
2
angles. As expected, buried residues were inflexible,
adopting similar conformations in different crystal
structure analyses. Ile, Thr, Asn, Asp, and the large
aromatics also showed limited flexibility when ex-
posed on the protein surface, whereas exposed Ser,
Lys, Arg, Met, Gln, and Glu residues were very
flexible. This information is different from and
complementary to the information available from
rotamer surveys. The confidence levels are useful
for assessing the significance of observed side-chain
motion and estimating the extent of side-chain mo-
tion in protein structure prediction. We compare
the performance of a simple 40° threshold with
these quantitative confidence levels in a critical
evaluation of side-chain prediction with the pro-
gram SCWRL. Proteins 2001;43:271–279.
© 2001 Wiley-Liss, Inc.
Key words: side-chain flexibility; protein structure
prediction
INTRODUCTION
Proteins combine structural rigidity with local flexibil-
ity. Most natural proteins adopt a defined folded structure,
with secondary-structure segments arranged in a defined
geometry. Layered on top of this relatively rigid core are
several levels of flexibility: occasionally, the motion of
entire domains alters the entire shape of the protein; often,
the motion of connecting loops and terminal extensions
modifies the shape of a cleft or extension; and in all
proteins, side-chain motion alters the local topography.
1
Side-chain conformation is determined by the intrinsic
torsional flexibility of each residue, which is then limited
by a combination of external factors: steric contacts with
the local peptide backbone, interactions with neighboring
parts of the protein, and interactions with surrounding
proteins and solvents.
Most analyses of side-chain conformation study the
range of motion available to a given residue type, but they
do not analyze the flexibility of a given residue within a
given protein environment. In a typical study, a database
of representative structures is chosen from the Protein
Data Bank (PDB), and the range of conformations is
tabulated for each type of residue. For
1
angles, this
yields the familiar three-peaked histograms, showing that
amino acids generally prefer the three staggered conforma-
tions [Fig. 1(A)]. These histograms may be used to gener-
ate rotamer libraries for protein structure prediction by
picking a representative set of conformations that will
cover most of the commonly observed (and, therefore,
energetically favored) ranges.
These analyses, however, do not yield information on the
flexibility of a given residue within a protein. All residues,
whether buried or exposed, are surrounded by other
residues, limiting their range of motion. Some positions
will allow motion between different rotameric states, but
other positions with stronger restraints will not allow such
flexibility. Because only a single structure of each protein
is included in rotamer surveys, location-specific interac-
tions tend to average out, and the results reflect primarily
the steric contacts with the main chain of adjacent resi-
dues, which are consistent across the entire test set.
Rotamer analyses reveal the most energetically favorable
conformations when observed in all environments, but a
different approach must be taken to determine the flexibil-
ity of individual residues within the environment of a
given protein.
Instead of surveying a single representative of each pro-
tein, we compared several different structures of each pro-
tein, looking for differences in side-chain conformation among
the different structure solutions. In this way, we could look at
each position, such as Arg14 in lysozyme, individually,
determining its range of motion and the effect of the local
environment on this motion. In this article, we report quanti-
tative values describing the ranges of amino acid flexibility
observed in uncomplexed protein structures. This informa-
tion has important implications for the design and evalua-
tion of protein prediction methods.
Manuscript 13192-MB from the Scripps Research Institute.
Grant sponsor: National Institutes of Health; Grant number: PO1
HL16411.
*Correspondence to: Arthur Olson, Department of Molecular Biol-
ogy, Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla,
CA 92037. E-mail: olson@scripps.edu
Received 5 September 2000; Accepted 22 January 2001
Published online 00 Month 2001
PROTEINS: Structure, Function, and Genetics 43:271–279 (2001)
© 2001 WILEY-LISS, INC.