The Trp Cage: Folding Kinetics and Unfolded State Topology via Molecular Dynamics Simulations Christopher D. Snow, Bojan Zagrovic, and Vijay S. Pande* Department of Chemistry and Biophysics Program, Stanford UniVersity, Stanford, California 94305-5080 Received September 18, 2002 A number of rapidly folding proteins have been characterized in recent years. 1 These small proteins can provide the first direct comparisons between simulated and experimental protein folding kinetics and pathways. 2 Proteins have been characterized through thermodynamic sampling methods, 3 unfolding simulations, 4 and folding simulations using simple potentials. 5 Here, as described recently, 2 we use several thousand stochastic dynamics simulations in a generalized-Born implicit solvent (in atomic detail) to simulate the folding dynamics of the Trp cage mini-protein under experi- mental conditions (27 °C with full solvent viscosity, γ ) 91 ps -1 ). The Folding@Home distributed computing project was used to generate an aggregate simulation time of 100 µs(250 CPU years). First we capture the rapid relaxation from an extended starting condition to a relaxed unfolded state ensemble of thousands of conformations. With continued simulation, a small fraction of these simulations reach the folded state. Furthermore, the topology of the collapsed unfolded state closely resembles the native state. The 20-residue Trp cage (NLYIQWLKDG GPSSGRPPPS) contains a short R-helix from residues 2 through 8, a 3 10 -helix from residues 11 through 14, and a C-terminal poly-proline II helix to pack against the central tryptophan. 6 Hagen and co-workers present direct experimental evidence for cooperative two-state folding with a room-temperature folding time of 4 µs. 7 With the assumption of two-state folding kinetics, short simulations from the unfolded state under native conditions have a certain small probability of folding. The length of the simulations must, of course, exceed the barrier- crossing time to fold. Furthermore, each simulation must include the time necessary to relax from a denatured state (here modeled as a completely extended form) to the unfolded state under native conditions. The Trp cage simulations collapsed in only nanoseconds. In fact, the entire extended state ensemble relaxed to a compact unfolded state ensemble in less than 10 ns. We continued the folding simulations well past the initial relaxation (over 1000 simulations extend to 30 ns) and observed folding events consistent with a two-state folding pathway. For a two-state protein with a folding constant τ ) 4 µs, we would expect 7.5 out of 1000 molecules to fold within the first 30 ns assuming rapid relaxation to the unfolded state and rapid two-state barrier crossing. Our simulations were generated through distributed computing techniques 8 and the full set of data includes simulations of many different lengths (i.e. 3000 simulations run to 1 ns, 2000 simulations extend to 5 ns, 1000 to 30 ns, 650 to 40 ns, 160 to 80 ns). Given that the relaxed unfolded ensemble has fairly low RMSD CR (the distribution is centered about 4.8 after 30 ns) and that the relaxed folded ensemble is centered about 2.4 after 10 ns, the choice of a good RMSD CR cutoff to define “folded” conformations from the unfolded ensemble is not trivial. In Figure 1 we present the estimated folding rate given a range of reasonable RMSD CR cutoffs at and below 3.0 Å. The estimated rate varies more from the arbitrary RMSD CR cutoff than any fitting errors. Thus we present a bounded folding time estimate ranging from τ ) 1.5 µs using a 3.0 Å cutoff to 8.7 µs for a 2.5 Å cutoff. We emphasize that the estimated folding rate assumes two-state folding to completion. By capturing the folding ensemble very early into folding (tens of ns), our methodology gives us a detailed view of the unfolded state under native conditions. Recently, we have postulated that in certain cases the ensemble averaged mean structure of the unfolded state may have nativelike topology. 9 The sampling of the unfolded state of Trp cage in our simulations has allowed us to test this mean structure hypothesis. At each time point along the folding trajectory we calculated the C -C distance matrix. By averaging these matrices, we obtained mean matrices representative of the ensemble at each time point. Since these matrices do not correspond to any individual structure, we further asked which real, physical conformer in the simulated ensemble at each time point was closest to these mean matrices, such that their distance root-mean square deviation (dRMS) from the mean C -C matrix is smallest (dRMS is a standard metric for comparing two structures, i.e., distance matrices, defined as dRMS ) [2Σ i>j [D ij (1) - D ij (2)] 2 /n(n - 1)] 1/2 , where D ij (x) refers to the distance between atoms i and j in structure x and n is the total number of atoms included within each structure. We find that the mean structure is similar to the NMR structure of the native state, with dRMS and RMSD CR values within the variation of the native state. For all time points after 10 ns (after collapse), the mean structure is, on average, closer (RMSD CR ) to the NMR structure than 87% of the other ensemble members. In the most favorable example (at 33 ns) RMSD CR ) 2.1 Å. Figure 2 * Corresponding author. E-mail: pande@stanford.edu. Figure 1. Estimated Trp cage folding kinetics. (A) The time-resolved population of folded molecules in the ensemble of over a thousand folding simulations for a variety of RMSDCR cutoff values. We fit each of these population growth curves to the initial stage of a single-exponential folding reaction to determine rate constants assuming the folding reaction will reach completion at long times. With 3.0 Å τ ) 1.5 µs, 2.9 Å τ ) 2.0 µs, 2.8 Å τ ) 3.1 µs, 2.7 Å τ ) 5.5 µs, 2.6 Å τ ) 6.9 µs, 2.5 Å τ ) 8.7 µs. Published on Web 11/16/2002 14548 9 J. AM. CHEM. SOC. 2002, 124, 14548-14549 10.1021/ja028604l CCC: $22.00 © 2002 American Chemical Society