Pathway Heterogeneity in Protein Folding Ariel Ferna´ ndez 1,2, * and Andre´ s Colubri 1 1 Institute for Biophysical Dynamics, The University of Chicago, Chicago, Illinois 2 Instituto de Matema´ tica, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Cientı ´ficas y Te´cnicas, Bahı ´a Blanca, Argentina ABSTRACT We generate ab initio folding path- ways in two single-domain proteins, hyperthermo- phile variant of protein G domain (1gb4) and ubiq- uitin (1ubi), both presumed to be two-state folders. Both proteins are endowed with the same topology but, as shown in this work, rely to a different extent on large-scale context to find their native folds. First, we demonstrate a generic feature of two-state folders: A downsizing of structural fluctuations is achieved only when the protein reaches a stationary plateau maximizing the number of highly protected hydrogen bonds. This enables us to identify the folding nucleus and show that folding does not become expeditious until a topology is generated that is able to protect intramolecular hydrogen bonds from water attack. Pathway heterogeneity is shown to be dependent on the extent to which the protein relies on large-scale context to fold, rather than on contact order: Proteins that can only stabi- lize native secondary structure by packing it against scaffolding hydrophobic moieties are meant to have a heterogeneous transition-state ensemble if they are to become successful folders (otherwise, success- ful folding would be too fortuitous an event.) We estimate mutational values as ensemble averages and deconvolute individual-route contributions to the averaged two-state kinetic picture. Our results find experimental corroboration in the well-studied chymotrypsin inhibitor (CI2), while leading to veri- fiable predictions for the other two study cases. Proteins 2002;48:293–310. © 2002 Wiley-Liss, Inc. Key words: protein G domain; ubiquitin; two-state folders; folding timescales INTRODUCTION A core question of the postgenomic era is: Which are the routes that a protein molecule can follow to fold into its native state within biologic time constraints? 1–12 We need to deal with this question to make sense of kinetic measure- ments and properties that are necessarily ensemble aver- ages and do not allow us to elucidate the folding behavior of individual molecules. 2, 4, 9 An assessment of pathway heterogeneity with a dissection of the ensemble-average kinetic picture at the single-molecule level might be desir- able on the following accounts: 1. To determine the extent of cooperativity 3, 9, 11–17 re- quired for individual chains to generate nascent folds in different large-scale contexts. 2. To understand the role of individual residues on the folding process 11–25 and predict values 26, 27 on indi- vidual pathways, dissecting the effect of site mutation on each of the different routes contributing to the transition state (TS) ensemble. Experimental values invariably represent ensemble averages over folding routes. 3. To interpret experimentally measured fractional values by weighting the individual contributions to the TS ensemble. 4. To dissect the two-state kinetics picture for single- domain proteins, 22, 28 –31 an ensemble-average sce- nario, by contrasting it with the information gathered on the different explorations of the single-molecule potential energy surface. 10, 11, 25 The ab initio treatment presented in this work ad- dresses these issues to some extent. The aim of this article is to illustrate how the research program described above may be realized by using a folding simulator that on occasion sacrifices structural resolution to make folding timescales computationally accessible. This algorithm used to generate coarsely defined pathways has already been shown to represent a rigorous projection of the torsional mechanics of the chain 1 and has been used to predict the TS for a nonhierarchical folder: -lactoglobulin. 19 a result that has been recently corroborated experimentally. 32 Two illustrative cases have been selected because of their widely different extents of pathway heterogeneity, concertedness, and cooperativity in spite of the fact that their native folds share the same topology: thermophile variant of protein G domain [Protein Data Bank (PDB) accession code: 1gb4] 33 and mammalian ubiquitin (Ub, PDB accession code: 1ubi). 34, 35 Our results support the two-state kinetics picture for both proteins (such a sce- nario has raised some controversy in the case of Ub 34, 35 ). They also reveal that for approximately the same contact order a wider pathway diversity is expected for a success- ful folder in which structural propensities leading to the native fold are dictated by a large-scale context, as is the case with Ub when contrasted with 1gb4. This result may be rationalized because the occurrence of a unique con- *Correspondence to: Ariel Ferna´ ndez, Institute for Biophysical Dynamics, CLS 439, The University of Chicago, Chicago, Il 60637. E-mail: ariel@uchicago.edu Received 12 January 2002; Accepted 5 March 2002 Published online 00 Month 2002 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.10155 PROTEINS: Structure, Function, and Genetics 48:293–310 (2002) © 2002 WILEY-LISS, INC.