Protein folding and the organization of the protein topology universe Kresten Lindorff-Larsen 1 , Peter Røgen 2 , Emanuele Paci 3 , Michele Vendruscolo 1 and Christopher M. Dobson 1 1 University of Cambridge, Department of Chemistry, Lensfield Road, Cambridge, UK, CB2 1EW 2 Department of Mathematics, Technical University of Denmark, Building 303, DK-2800 Kongens Lyngby, Denmark 3 University of Zu ¨ rich, Department of Biochemistry, Winterthurerstrasse 190, 8057 Zu ¨ rich, Switzerland The mechanism by which proteins fold to their native states has been the focus of intense research in recent years. The rate-limiting event in the folding reaction is the formation of a conformation in a set known as the transition-state ensemble. The structural features pre- sent within such ensembles have now been analysed for a series of proteins using data from a combination of biochemical and biophysical experiments together with computer-simulation methods. These studies show that the topology of the transition state is determined by a set of interactions involving a small number of key residues and, in addition, that the topology of the transition state is closer to that of the native state than to that of any other fold in the protein universe. Here, we review the evidence for these conclusions and suggest a molecular mechanism that rationalizes these findings by presenting a view of protein folds that is based on the topological features of the polypeptide backbone, rather than the conventional view that depends on the arrangement of different types of secondary-structure elements. By linking the folding process to the organ- ization of the protein structure universe, we propose an explanation for the overwhelming importance of topology in the transition states for protein folding. The widespread application of the methods of structural biology is beginning to provide a comprehensive picture of the variety of possible native folds available to proteins [1–4]. Folding to these structures is, in most cases, the final and crucial step in the transformation of genetic information into a specific biological function. A full understanding of the mechanisms by which folding occurs therefore represents the solution to a central problem in molecular biology [5–7]. Procedures have recently been developed to provide a molecular description of the conformations that are rate- limiting in the folding of a given protein, the transition- state ensemble (TSE; see Glossary), by incorporating the results of a mutational analysis of folding kinetics into computer simulations [8]. Using this approach, ensembles of conformations representing the TSE have been deter- mined for a series of proteins [8–15]. Examination of these ensembles has shown that establishing the correct overall topology of the polypeptide chain is a crucial aspect of protein folding. This observation is in accord with a series of studies that have shown that the folding rate of a protein, to a first approximation, can be related to the entropic cost of forming the native-like topology [16–22]. The structural changes occurring during protein fold- ing have also been analysed in detail for a series of proteins and we discuss some of these studies here. The results enable the topological view of folding to be reconciled with the well-established concept of nucleation [23] by showing that – despite the many different ways in which a given topology could, in principle, be generated – individual proteins use interactions between a specific and limited set of residues to define the fold [8,15,23,24]. Together with methods that aim to describe protein structures in terms of general topological quantities [25,26], these results indicate how the underlying principles that determine the native-state structure are closely related to those that guide the protein- folding reaction. Glossary Generalized Gauss integrals: A family of geometric measures constructed to classify protein conformations in terms of general conformational properties [26]. An example of one of these measures is the average number of times a chain segment crosses over and under any other segment when averaged over all directions from which the chain is seen. Molecular dynamics simulations: A computational method to calculate the time-dependent behaviour of a molecular system. In classical molecular dynamics simulations, a force field associated with the potential energy of a protein is used and Newton’s equations of motion are integrated to sample the relevant conformations of all the atoms in a protein molecule. In restrained molecular dynamics simulations, the force field is modified to take experimen- tal data into account, to bias the simulations towards those regions of conformation space that are consistent with the experiment. This procedure enables protein conformations that are in agreement with available exper- imental data (e.g. F-values) to be obtained even if they have free energies that are far from the minimum (e.g. at the TSE) in the unrestrained simulations [10]. Transition-state ensemble: An ensemble of conformations, the formation of which is rate-limiting for folding. If folding from the denatured state is modelled as occurring on a free-energy landscape, the transition state is associated with the largest barrier that needs to be crossed to reach the native state [5,7]. F-Value analysis: A kinetic method for obtaining structural information about the transition state for protein folding. Individual amino acid mutations are made throughout the protein sequence, and the effects of the mutations on the folding and unfolding kinetics, in addition to the thermodynamics of folding, are measured. The F-value for a specific mutation is the ratio of the stability change in the transition state and in the native state accompanying that mutation and, thus, reports the extent to which the interactions found in the native state can also be found in the transition state [7]. Corresponding authors: Vendruscolo, M. (mv245@cam.ac.uk), Dobson, C.M. (cmd44@cam.ac.uk). Available online 7 December 2004 Opinion TRENDS in Biochemical Sciences Vol.30 No.1 January 2005 www.sciencedirect.com 0968-0004/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tibs.2004.11.008