Review: What Can Structural Classifications Reveal
about Protein Evolution?
Christine A. Orengo, Ian Sillitoe, Gabrielle Reeves, and Frances M. G. Pearl
Department of Biochemistry and Molecular Biology, University College, Gower Street, London WC1E 6BT, United Kingdom
Received December 19, 2000, and in revised form June 19, 2001
In this article we present a review of the methods
used for comparing and classifying protein struc-
tures. We discuss the hierarchies and populations of
fold groups and evolutionary families in some of the
major classifications and we consider some of the
problems confronting any general analyses of struc-
tural evolution in protein families. We also review
some more recent analyses that have expanded
these classifications by identifying sequence rela-
tives in the genomes and thereby reveal interesting
trends in fold usage and recurrence. © 2001 Academic
Press
INTRODUCTION
Proteins are known to evolve by mutations in the
amino acid residues comprising their polypeptide
chains and by insertions and deletions of these res-
idues. How extensive can these changes be and are
some types of structure more tolerant of change than
others and conversely some more profoundly af-
fected? How do these changes correlate with changes
in the functions of the proteins? Most importantly,
how should we organise the structural data, so as to
be able to answer these questions accurately and
informatively, and are the current data sufficient to
provide any meaningful answers to these questions
except for a few highly populated and well-studied
families?
Since the 1970s there has been an exponential
increase in the numbers of protein structures deter-
mined and the Protein Databank now held at the
Research Collaboratory in Structural Biology
(RCSB), at Rutgers (Berman et al., 2000) currently
contains over 15 000 entries. In parallel, over the
past 5 years a number of structural classifications
have arisen based on a variety of philosophies for
recognising structural similarities and for clustering
proteins on the basis of these similarities. These
range from predominantly manual to completely au-
tomated protocols (see Holm and Sander (1994a),
Orengo (1994) for reviews). More recently the data
contained within some of these classifications and
“collections” have been expanded up to 10-fold by
including extensive sequence data from the genomes
(Pearl et al., 2000; Wang et al., 2000; Teichmann et
al., 2000). This is largely due to improvements in
protocols for searching sequence databases (Park et
al., 1998) that have enabled rapid identification of
clear homologues to structural families in these da-
tabases.
This expansion of both structural and sequence
data has allowed a more profound analysis of evolu-
tionary repertoires within the known protein struc-
tural families, particularly regarding function. Fur-
thermore, because structure is much more highly
conserved within a family than sequence, this map-
ping of structural families to gene sequences is be-
ginning to provide interesting new insights into the
phylogeny of organisms and the mechanisms by
which gene duplication and recruitment can expand
the functional repertoire of an organism. With the
expected increases in structural data promised by
the international structural genomics inititatives
(Rost, 1998; Shapiro and Lima, 1998) and the prob-
able increases in performance of the 1D–3D predic-
tion methods, which will be accelerated and vali-
dated by the CASP competitions in the United
States (Moult et al., 1999), we can expect our under-
standing of protein evolution to increase consider-
ably over the next decade.
In this review we will first address the major
themes in structural comparison and classification
and describe the various classifications, commenting
on the populations of fold groups and superfamilies
within them. Recent methods for assessing struc-
tural plasticity within protein families will be dis-
cussed together with the effects of functional and
physicochemical constraints on structural diver-
gence. Finally, we will review some recent analyses
that have combined the structural data from these
classifications with data from sequence relatives to
Journal of Structural Biology 134, 145–165 (2001)
doi:10.1006/jsbi.2001.4398, available online at http://www.idealibrary.com on
145 1047-8477/01 $35.00
Copyright © 2001 by Academic Press
All rights of reproduction in any form reserved.