The Trp Cage: Folding Kinetics and Unfolded State Topology via Molecular
Dynamics Simulations
Christopher D. Snow, Bojan Zagrovic, and Vijay S. Pande*
Department of Chemistry and Biophysics Program, Stanford UniVersity, Stanford, California 94305-5080
Received September 18, 2002
A number of rapidly folding proteins have been characterized
in recent years.
1
These small proteins can provide the first direct
comparisons between simulated and experimental protein folding
kinetics and pathways.
2
Proteins have been characterized through
thermodynamic sampling methods,
3
unfolding simulations,
4
and
folding simulations using simple potentials.
5
Here, as described
recently,
2
we use several thousand stochastic dynamics simulations
in a generalized-Born implicit solvent (in atomic detail) to simulate
the folding dynamics of the Trp cage mini-protein under experi-
mental conditions (27 °C with full solvent viscosity, γ ) 91 ps
-1
).
The Folding@Home distributed computing project was used to
generate an aggregate simulation time of ∼100 µs(∼250 CPU
years). First we capture the rapid relaxation from an extended
starting condition to a relaxed unfolded state ensemble of thousands
of conformations. With continued simulation, a small fraction of
these simulations reach the folded state. Furthermore, the topology
of the collapsed unfolded state closely resembles the native state.
The 20-residue Trp cage (NLYIQWLKDG GPSSGRPPPS)
contains a short R-helix from residues 2 through 8, a 3
10
-helix from
residues 11 through 14, and a C-terminal poly-proline II helix to
pack against the central tryptophan.
6
Hagen and co-workers present
direct experimental evidence for cooperative two-state folding with
a room-temperature folding time of 4 µs.
7
With the assumption of
two-state folding kinetics, short simulations from the unfolded state
under native conditions have a certain small probability of folding.
The length of the simulations must, of course, exceed the barrier-
crossing time to fold. Furthermore, each simulation must include
the time necessary to relax from a denatured state (here modeled
as a completely extended form) to the unfolded state under native
conditions. The Trp cage simulations collapsed in only nanoseconds.
In fact, the entire extended state ensemble relaxed to a compact
unfolded state ensemble in less than 10 ns.
We continued the folding simulations well past the initial
relaxation (over 1000 simulations extend to 30 ns) and observed
folding events consistent with a two-state folding pathway. For a
two-state protein with a folding constant τ ) 4 µs, we would expect
∼7.5 out of 1000 molecules to fold within the first 30 ns assuming
rapid relaxation to the unfolded state and rapid two-state barrier
crossing. Our simulations were generated through distributed
computing techniques
8
and the full set of data includes simulations
of many different lengths (i.e. ∼3000 simulations run to 1 ns, ∼2000
simulations extend to 5 ns, ∼1000 to 30 ns, ∼650 to 40 ns, ∼160
to 80 ns).
Given that the relaxed unfolded ensemble has fairly low RMSD
CR
(the distribution is centered about ∼4.8 after 30 ns) and that the
relaxed folded ensemble is centered about ∼2.4 after 10 ns, the
choice of a good RMSD
CR
cutoff to define “folded” conformations
from the unfolded ensemble is not trivial. In Figure 1 we present
the estimated folding rate given a range of reasonable RMSD
CR
cutoffs at and below 3.0 Å. The estimated rate varies more from
the arbitrary RMSD
CR
cutoff than any fitting errors. Thus we present
a bounded folding time estimate ranging from τ ) 1.5 µs using a
3.0 Å cutoff to 8.7 µs for a 2.5 Å cutoff. We emphasize that the
estimated folding rate assumes two-state folding to completion.
By capturing the folding ensemble very early into folding (tens
of ns), our methodology gives us a detailed view of the unfolded
state under native conditions. Recently, we have postulated that in
certain cases the ensemble averaged mean structure of the unfolded
state may have nativelike topology.
9
The sampling of the unfolded
state of Trp cage in our simulations has allowed us to test this mean
structure hypothesis. At each time point along the folding trajectory
we calculated the C
-C
distance matrix. By averaging these
matrices, we obtained mean matrices representative of the ensemble
at each time point. Since these matrices do not correspond to any
individual structure, we further asked which real, physical conformer
in the simulated ensemble at each time point was closest to these
mean matrices, such that their distance root-mean square deviation
(dRMS) from the mean C
-C
matrix is smallest (dRMS is a
standard metric for comparing two structures, i.e., distance matrices,
defined as dRMS ) [2Σ
i>j
[D
ij
(1) - D
ij
(2)]
2
/n(n - 1)]
1/2
, where
D
ij
(x) refers to the distance between atoms i and j in structure x
and n is the total number of atoms included within each structure.
We find that the mean structure is similar to the NMR structure
of the native state, with dRMS and RMSD
CR
values within the
variation of the native state. For all time points after 10 ns (after
collapse), the mean structure is, on average, closer (RMSD
CR
) to
the NMR structure than 87% of the other ensemble members. In
the most favorable example (at 33 ns) RMSD
CR
) 2.1 Å. Figure 2 * Corresponding author. E-mail: pande@stanford.edu.
Figure 1. Estimated Trp cage folding kinetics. (A) The time-resolved
population of folded molecules in the ensemble of over a thousand folding
simulations for a variety of RMSDCR cutoff values. We fit each of these
population growth curves to the initial stage of a single-exponential folding
reaction to determine rate constants assuming the folding reaction will reach
completion at long times. With 3.0 Å τ ) 1.5 µs, 2.9 Å τ ) 2.0 µs, 2.8 Å
τ ) 3.1 µs, 2.7 Å τ ) 5.5 µs, 2.6 Å τ ) 6.9 µs, 2.5 Å τ ) 8.7 µs.
Published on Web 11/16/2002
14548 9 J. AM. CHEM. SOC. 2002, 124, 14548-14549 10.1021/ja028604l CCC: $22.00 © 2002 American Chemical Society