Review: What Can Structural Classifications Reveal about Protein Evolution? Christine A. Orengo, Ian Sillitoe, Gabrielle Reeves, and Frances M. G. Pearl Department of Biochemistry and Molecular Biology, University College, Gower Street, London WC1E 6BT, United Kingdom Received December 19, 2000, and in revised form June 19, 2001 In this article we present a review of the methods used for comparing and classifying protein struc- tures. We discuss the hierarchies and populations of fold groups and evolutionary families in some of the major classifications and we consider some of the problems confronting any general analyses of struc- tural evolution in protein families. We also review some more recent analyses that have expanded these classifications by identifying sequence rela- tives in the genomes and thereby reveal interesting trends in fold usage and recurrence. © 2001 Academic Press INTRODUCTION Proteins are known to evolve by mutations in the amino acid residues comprising their polypeptide chains and by insertions and deletions of these res- idues. How extensive can these changes be and are some types of structure more tolerant of change than others and conversely some more profoundly af- fected? How do these changes correlate with changes in the functions of the proteins? Most importantly, how should we organise the structural data, so as to be able to answer these questions accurately and informatively, and are the current data sufficient to provide any meaningful answers to these questions except for a few highly populated and well-studied families? Since the 1970s there has been an exponential increase in the numbers of protein structures deter- mined and the Protein Databank now held at the Research Collaboratory in Structural Biology (RCSB), at Rutgers (Berman et al., 2000) currently contains over 15 000 entries. In parallel, over the past 5 years a number of structural classifications have arisen based on a variety of philosophies for recognising structural similarities and for clustering proteins on the basis of these similarities. These range from predominantly manual to completely au- tomated protocols (see Holm and Sander (1994a), Orengo (1994) for reviews). More recently the data contained within some of these classifications and “collections” have been expanded up to 10-fold by including extensive sequence data from the genomes (Pearl et al., 2000; Wang et al., 2000; Teichmann et al., 2000). This is largely due to improvements in protocols for searching sequence databases (Park et al., 1998) that have enabled rapid identification of clear homologues to structural families in these da- tabases. This expansion of both structural and sequence data has allowed a more profound analysis of evolu- tionary repertoires within the known protein struc- tural families, particularly regarding function. Fur- thermore, because structure is much more highly conserved within a family than sequence, this map- ping of structural families to gene sequences is be- ginning to provide interesting new insights into the phylogeny of organisms and the mechanisms by which gene duplication and recruitment can expand the functional repertoire of an organism. With the expected increases in structural data promised by the international structural genomics inititatives (Rost, 1998; Shapiro and Lima, 1998) and the prob- able increases in performance of the 1D–3D predic- tion methods, which will be accelerated and vali- dated by the CASP competitions in the United States (Moult et al., 1999), we can expect our under- standing of protein evolution to increase consider- ably over the next decade. In this review we will first address the major themes in structural comparison and classification and describe the various classifications, commenting on the populations of fold groups and superfamilies within them. Recent methods for assessing struc- tural plasticity within protein families will be dis- cussed together with the effects of functional and physicochemical constraints on structural diver- gence. Finally, we will review some recent analyses that have combined the structural data from these classifications with data from sequence relatives to Journal of Structural Biology 134, 145–165 (2001) doi:10.1006/jsbi.2001.4398, available online at http://www.idealibrary.com on 145 1047-8477/01 $35.00 Copyright © 2001 by Academic Press All rights of reproduction in any form reserved.