COMMUNICATION First Principles Prediction of Protein Folding Rates Derek A. Debe and William A. Goddard III* Materials and Process Simulation Center (MSC) Beckman Institute (139-74) Division of Chemistry and Chemical Engineering California Institute of Technology, Pasadena, CA 91125, USA Experimental studies have demonstrated that many small, single-domain proteins fold via simple two-state kinetics. We present a ®rst principles approach for predicting these experimentally determined folding rates. Our approach is based on a nucleation-condensation folding mechanism, where the rate-limiting step is a random, diffusive search for the native tertiary topology. To estimate the rates of folding for various proteins via this mechanism, we ®rst determine the probability of randomly sampling a conformation with the native fold topology. Next, we convert these probabilities into folding rates by estimating the rate that a protein samples different topologies during diffusive folding. This topology- sampling rate is calculated using the Einstein diffusion equation in con- junction with an experimentally determined intra-protein diffusion con- stant. We have applied our prediction method to the 21 topologically distinct small proteins for which two-state rate data is available. For the 18 beta-sheet and mixed alpha-beta native proteins, we predict folding rates within an average factor of 4, even though the experimental rates vary by a factor of 4 10 4 . Interestingly, the experimental folding rates for the three four-helix bundle proteins are signi®cantly underestimated by this approach, suggesting that proteins with signi®cant helical content may fold by a faster, alternative mechanism. This method can be applied to any protein for which the structure is known and hence can be used to predict the folding rates of many proteins prior to experiment. # 1999 Academic Press Keywords: protein folding; kinetics; diffusion; fold topology; nucleation- condensation *Corresponding author One of the most important challenges in biology is to understand the relationship between the folded structure of a protein and its primary amino acid sequence. Consequently, there has been great interest in understanding how proteins fold. An important advance in 1991 was the experimental demonstration that stable intermediates were not present in the fast folding of chymotrypsin inhibi- tor 2 (Jackson & Fersht, 1991). Since then, two-state folding rates for 20 more small (<120 residues), topologically distinct proteins have been deter- mined, providing suf®cient rate data to begin test- ing quantitative aspects of proposed folding mechanisms (Jackson, 1998). Recently, Plaxco et al. 1998b) reported a statistically signi®cant corre- lation between the natural log of the two-state fold- ing rate, ln(k f ), and a measure of the native state topological complexity (contact order). This empiri- cal observation suggests that the chemistry under- lying the folding of simple, single-domain proteins may be universal, implying that a single mechanis- tic model might quantitatively account for the observed folding rates. We recently proposed the Topomer-Sampling Model (TSM) of protein folding, wherein proteins fold by a two-state mechanism consisting of (Debe et al., 1999a): (i) Topomer diffusion: random, diffusive sampling to ®nd the native topomer (topomers are tubes of topologically equivalent conformations), followed by (ii) Intra-topomer ordering: non-random, local conformational changes within the native topology to ®nd the unique native state. E-mail address of the corresponding author: wag@wag.caltech.edu. Abbreviations used: TSM, topomer-sampling model; RGP, restrained generic protein; NTP, native topology probability. Article No. jmbi.1999.3278 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 294, 619±625 0022-2836/99/480619±7 $30.00/0 # 1999 Academic Press