Pathway Heterogeneity in Protein Folding
Ariel Ferna´ ndez
1,2,
*
and Andre´ s Colubri
1
1
Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois
2
Instituto de Matema´ tica, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Cientı ´ficas y Te´cnicas, Bahı ´a
Blanca, Argentina
ABSTRACT We generate ab initio folding path-
ways in two single-domain proteins, hyperthermo-
phile variant of protein G domain (1gb4) and ubiq-
uitin (1ubi), both presumed to be two-state folders.
Both proteins are endowed with the same topology
but, as shown in this work, rely to a different extent
on large-scale context to find their native folds.
First, we demonstrate a generic feature of two-state
folders: A downsizing of structural fluctuations is
achieved only when the protein reaches a stationary
plateau maximizing the number of highly protected
hydrogen bonds. This enables us to identify the
folding nucleus and show that folding does not
become expeditious until a topology is generated
that is able to protect intramolecular hydrogen
bonds from water attack. Pathway heterogeneity is
shown to be dependent on the extent to which the
protein relies on large-scale context to fold, rather
than on contact order: Proteins that can only stabi-
lize native secondary structure by packing it against
scaffolding hydrophobic moieties are meant to have
a heterogeneous transition-state ensemble if they
are to become successful folders (otherwise, success-
ful folding would be too fortuitous an event.) We
estimate mutational values as ensemble averages
and deconvolute individual-route contributions to
the averaged two-state kinetic picture. Our results
find experimental corroboration in the well-studied
chymotrypsin inhibitor (CI2), while leading to veri-
fiable predictions for the other two study cases.
Proteins 2002;48:293–310. © 2002 Wiley-Liss, Inc.
Key words: protein G domain; ubiquitin; two-state
folders; folding timescales
INTRODUCTION
A core question of the postgenomic era is: Which are the
routes that a protein molecule can follow to fold into its
native state within biologic time constraints?
1–12
We need
to deal with this question to make sense of kinetic measure-
ments and properties that are necessarily ensemble aver-
ages and do not allow us to elucidate the folding behavior
of individual molecules.
2, 4, 9
An assessment of pathway
heterogeneity with a dissection of the ensemble-average
kinetic picture at the single-molecule level might be desir-
able on the following accounts:
1. To determine the extent of cooperativity
3, 9, 11–17
re-
quired for individual chains to generate nascent folds in
different large-scale contexts.
2. To understand the role of individual residues on the
folding process
11–25
and predict values
26, 27
on indi-
vidual pathways, dissecting the effect of site mutation
on each of the different routes contributing to the
transition state (TS) ensemble. Experimental values
invariably represent ensemble averages over folding
routes.
3. To interpret experimentally measured fractional
values by weighting the individual contributions to the
TS ensemble.
4. To dissect the two-state kinetics picture for single-
domain proteins,
22, 28 –31
an ensemble-average sce-
nario, by contrasting it with the information gathered
on the different explorations of the single-molecule
potential energy surface.
10, 11, 25
The ab initio treatment presented in this work ad-
dresses these issues to some extent. The aim of this article
is to illustrate how the research program described above
may be realized by using a folding simulator that on
occasion sacrifices structural resolution to make folding
timescales computationally accessible. This algorithm used
to generate coarsely defined pathways has already been
shown to represent a rigorous projection of the torsional
mechanics of the chain
1
and has been used to predict the
TS for a nonhierarchical folder: -lactoglobulin.
19
a result
that has been recently corroborated experimentally.
32
Two illustrative cases have been selected because of
their widely different extents of pathway heterogeneity,
concertedness, and cooperativity in spite of the fact that
their native folds share the same topology: thermophile
variant of protein G domain [Protein Data Bank (PDB)
accession code: 1gb4]
33
and mammalian ubiquitin (Ub,
PDB accession code: 1ubi).
34, 35
Our results support the
two-state kinetics picture for both proteins (such a sce-
nario has raised some controversy in the case of Ub
34, 35
).
They also reveal that for approximately the same contact
order a wider pathway diversity is expected for a success-
ful folder in which structural propensities leading to the
native fold are dictated by a large-scale context, as is the
case with Ub when contrasted with 1gb4. This result may
be rationalized because the occurrence of a unique con-
*Correspondence to: Ariel Ferna´ ndez, Institute for Biophysical
Dynamics, CLS 439, The University of Chicago, Chicago, Il 60637.
E-mail: ariel@uchicago.edu
Received 12 January 2002; Accepted 5 March 2002
Published online 00 Month 2002 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/prot.10155
PROTEINS: Structure, Function, and Genetics 48:293–310 (2002)
© 2002 WILEY-LISS, INC.