A Rigorous and Efficient Method To Reweight Very Large
Conformational Ensembles Using Average Experimental Data and To
Determine Their Relative Information Content
Hoi Tik Alvin Leung,
†
Olivier Bignucolo,
‡
Regula Aregger,
§
Sonja A. Dames,
∥,¶
Adam Mazur,
†
Simon Berne ̀ che,
‡
and Stephan Grzesiek*
,†
†
Focal Area Structural Biology and Biophysics, Biozentrum,
‡
SIB Swiss Institute of Bioinformatics, University of Basel, CH-4056
Basel, Switzerland
§
Institut fü r Biochemie, University of Leipzig, D-04103 Leipzig, Germany
∥
Department of Chemistry, Technische Universitä t Mü nchen, D-85748 Garching, Germany
¶
Institute of Structural Biology, Helmholtz Zentrum Mü nchen, D-85764 Neuherberg, Germany
* S Supporting Information
ABSTRACT: Flexible polypeptides such as unfolded proteins
may access an astronomical number of conformations. The most
advanced simulations of such states usually comprise tens of
thousands of individual structures. In principle, a comparison of
parameters predicted from such ensembles to experimental data
provides a measure of their quality. In practice, analyses that go
beyond the comparison of unbiased average data have been
impossible to carry out on the entirety of such very large
ensembles and have, therefore, been restricted to much smaller
subensembles and/or nondeterministic algorithms. Here, we show
that such very large ensembles, on the order of 10
4
to 10
5
conformations, can be analyzed in full by a maximum entropy fit
to experimental average data. Maximizing the entropy of the
population weights of individual conformations under experimen-
tal χ
2
constraints is a convex optimization problem, which can be solved in a very efficient and robust manner to a unique global
solution even for very large ensembles. Since the population weights can be determined reliably, the reweighted full ensemble
presents the best model of the combined information from simulation and experiment. Furthermore, since the reduction of
entropy due to the experimental constraints is well-defined, its value provides a robust measure of the information content of the
experimental data relative to the simulated ensemble and an indication for the density of the sampling of conformational space.
The method is applied to the reweighting of a 35 000 frame molecular dynamics trajectory of the nonapeptide EGAAWAASS by
extensive NMR
3
J coupling and RDC data. The analysis shows that RDCs provide significantly more information than
3
J
couplings and that a discontinuity in the RDC pattern at the central tryptophan is caused by a cluster of helical conformations.
Reweighting factors are moderate and consistent with errors in MD force fields of less than 3kT. The required reweighting is
larger for an ensemble derived from a statistical coil model, consistent with its coarser nature. We call the method COPER, for
convex optimization for ensemble reweighting. Similar advantages of large-scale efficiency and robustness can be obtained for
other ensemble analysis methods with convex targets and constraints, such as constrained χ
2
minimization and the maximum
occurrence method.
■
INTRODUCTION
Proteins exist as ensembles of interchanging conformations.
Obviously, unfolded polypeptide chains, such as chemically or
physically denatured proteins and intrinsically disordered
proteins (IDPs), can access an extremely large number of
conformations.
1
A comprehensive description of their
structural preferences is a prerequisite for understanding
protein folding and the function of IDPs in health and
disease.
2
However, native, folded proteins also usually adopt
many conformations close to the global free energy
minimum,
3
and their interchange is a hallmark of protein
function, such as catalysis
4
or signal transduction.
5
A detailed experimental determination of individual
structures in such protein ensembles becomes impossible as
soon as their number exceeds a few, since the number of
conformational degrees of freedom quickly outpaces the
number of measurable parameters.
6
To make progress,
Received: August 7, 2015
Published: November 2, 2015
Article
pubs.acs.org/JCTC
© 2015 American Chemical Society 383 DOI: 10.1021/acs.jctc.5b00759
J. Chem. Theory Comput. 2016, 12, 383−394