189 There has been considerable progress made over the past year in linking experimental and theoretical approaches to protein folding. Recent results from several independent lines of investigation suggest that protein folding mechanisms and landscapes are largely determined by the topology of the native state and are relatively insensitive to details of the interatomic interactions. This dependence on low-resolution structural features, rather than high-resolution detail, suggests that it should be possible to describe the fundamental physics of the folding process using relatively low-resolution models. Recent experiments have set benchmarks for testing new models and progress has been made in developing theoretical models for interpreting and predicting experimental results. Addresses Department of Biochemistry, Box 357350, University of Washington, Seattle, WA 98195, USA *e-mail: dabaker@u.washington.edu Current Opinion in Structural Biology 1999, 9:189–196 http://biomednet.com/elecref/0959440X00900189 © Elsevier Science Ltd ISSN 0959-440X Abbreviations CI-2 chymotrypsin inhibitor II CspB Bacillus cold-shock protein DC diffusion collision HP hydrophobic–polar SH Src homology Introduction An exciting feature of research into protein folding is that experimental and theoretical approaches can be closely integrated. There can be considerable synergy between the two: new experimental results can drive new theoret- ical developments, whereas the failures of current theories highlight areas demanding further experimental investigation. In this review, we focus on recent advances at the interface between theory and experiment in pro- tein folding, with an emphasis on small, two-state folding proteins because of their simplicity. Is the folding process extensively optimized by natural selection? Before considering models for folding based on simple physical principles, it is important to determine whether the folding process has been extensively optimized by natural selection; if it has, then a workable theory must take into account the features that have been optimized. Clearly, protein stability and function are the result of extensive evolutionary optimization. Very few of the enormous number of possible sequences actually fold to a unique three-dimensional structure and, of these, per- haps an even smaller fraction carry out a useful biological function. Are folding rates and mechanisms also subject to evolu- tionary optimization? This question can be investigated by comparing the folding rates of protein sequences gen- erated in the laboratory with those of naturally occurring sequences. Phage display selection for properly folded proteins was used to retrieve heavily mutated sequences of two small protein domains that retain the ability to fold from high complexity libraries [1 •• ,2 •• ]. Although the stabilities of all the variants were less than those of the parent wild-type proteins, their folding rates fluctu- ated around those of the native proteins, with roughly half of the variants folding faster and half folding slower than the wild-type proteins. A highly simplified SH3 domain variant, in which the vast majority of the residues outside the binding pocket were replaced by one of five amino acids (isoleucine, lysine, glutamic acid, glycine or alanine), folded several times faster than the wild-type SH3 domain. These data suggest that folding rates are not extensively optimized by natural selection, as proteins selected in the laboratory without regard to the folding rate fold as fast or faster than naturally occur- ring proteins. Native state topology is a critical determinant of folding rates and mechanisms The phage display studies suggest that folding rates are not strongly dependent on the protein sequence, as large sequence perturbations have relatively small effects on folding rates. A comparison of the folding rates of homolo- gous proteins has led to similar conclusions. For example, despite large differences in the stabilities of mesophilic, thermophilic and hyperthermophilic members of the Bacillus cold-shock protein (CspB) family, the folding rates and solvent accessibilities of the folding transition states are virtually identical for the three proteins [3 •• ]. The interactions that are responsible for the differences in sta- bility thus appear to be made after the folding transition state. Similar folding rates and mechanisms are also observed across the SH3 protein family [4–6]. A higher resolution comparison of folding mechanisms of homologous proteins has been made possible by the char- acterization of the effects of point mutations on the folding rates of the Src and spectrin SH3 domains, which have only 30% sequence identity. Despite their very dif- ferent sequences, the available data suggest that their folding transition states are similar [7 •• ,8 •• ]. Together with the results from the homologous protein sets and the phage display libraries described in previous paragraphs, these data suggest that topology is a critical determinant of folding rate and mechanism. The preceding studies demonstrate that native state topolo- gy is an important determinant of the folding mechanism, Matching theory and experiment in protein folding Eric Alm and David Baker*