Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory
²
William C. Swope* and Jed W. Pitera
IBM Almaden Research Center, 650 Harry Road, San Jose, California 95120
Frank Suits
IBM Watson Research Center, Route 134, Yorktown Heights, New York 10598
ReceiVed: NoVember 10, 2003; In Final Form: February 21, 2004
A rigorous formalism for the extraction of state-to-state transition functions from a Boltzmann-weighted
ensemble of microcanonical molecular dynamics simulations has been developed as a way to study the kinetics
of protein folding in the context of a Markov chain. Analysis of these transition functions for signatures of
Markovian behavior is described. The method has been applied to an example problem that is based on an
underlying Markov process. The example problem shows that when an instance of the process is analyzed
under the assumption that the underlying states have been aggregated into macrostates, a procedure known
as lumping, the resulting chain appears to have been produced by a non-Markovian process when viewed at
high temporal resolution. However, when viewed on longer time scales, and for appropriately lumped
macrostates, Markovian behavior can be recovered. The potential for extracting the long time scale behavior
of the folding process from a large number of short, independent molecular dynamics simulations is also
explored.
1. Introduction
An understanding of the mechanisms by which proteins fold
would have wide utility in many areas, ranging from the
development of effective treatments for protein folding related
diseases to exploitation of the underlying principles of folding
to facilitate industrial nanotechnology. The study of protein
folding has three aspects: thermodynamics, kinetics, and
structure prediction. In this work we introduce an approach to
characterizing some aspects of protein folding kinetics and apply
it to a simple example problem. In a companion paper,
1
we
apply the approach to the folding of a small peptide, the
C-terminal -hairpin motif from protein G.
Protein folding has been extensively studied experimentally
2-6
and by computer simulation.
7-12
Computer simulations can
provide information about the process that is highly comple-
mentary to that obtained from experiment.
8,13-17
Furthermore,
the computer power available for biomolecular simulations in
general, and protein folding in particular, is increasing through
the production of improved software to exploit parallelism,
18
specialized hardware,
19
larger and faster computer systems and
grid and distributed computing approaches.
20-23
Indeed, the IBM
BlueGene project,
24-27
to build a massively parallel computer
to investigate biomolecular processes such as protein folding,
is expected to systematically study a variety of peptide and small
protein systems and will produce very large volumes of
simulation data. One significant advantage of this greater
computer power is that the field is moving from studies that
report on single events observed during single trajectories of
limited duration,
7
to studies where extensive thermodynamic
sampling has been performed
11-13,28-30
and ensembles of
trajectories are produced and analyzed.
8,9
Obtaining large
numbers of independent trajectories is not only a very effective
way to use parallel computing technologies but is required for
statistically meaningful and reproducible results.
31
Because of
this move to more comprehensive simulations, new and autom-
atable analysis procedures that can be applied consistently to
data from simulations of a variety of protein systems need to
be developed and validated.
Protein folding is generally studied in the liquid phase, where
the protein or peptide is in contact with a solvent. Besides
providing part of the driving force for the folding process,
through hydrophobic and hydrophilic hydration, the solvent also
provides friction and a heat bath for the process. In fact, because
of the random forces exerted by the solvent, one would expect
that if several identical peptides could be prepared in the same
conformation and solvated, they would very likely adopt
different folding trajectories, perhaps following completely
different paths and taking different amounts of time to reach
native conformations. It is because of this stochastic nature of
folding that one should be careful not to draw strong conclusions
about the process if they are deduced from single MD
trajectories. But given that hundreds of protein simulation
trajectories can be produced, what is the best way to use them
to understand the process of folding? One possible approach,
explored in this work, is to analyze the trajectories to produce
a probability for the evolution of the protein from one
conformational state to another. The formalism associated with
Markov processes and models is, therefore, a natural approach
for this analysis.
Markov models of stochastic processes deal with the temporal
evolution of the state of a system. They are appropriate when
the memory of the system is short. That is, when the evolution
of the system into the near future depends only its properties at
the current time, and not on any of its prior history. Markov
models can be of several types depending on whether one
discretizes the time domain, the state space, or both. With a
discrete time Markov chain, both the time and space domain
²
Part of the special issue “Hans C. Andersen Festschrift”.
* To whom correspondence should be addressed.
6571 J. Phys. Chem. B 2004, 108, 6571-6581
10.1021/jp037421y CCC: $27.50 © 2004 American Chemical Society
Published on Web 04/22/2004