A Rigorous and Ecient Method To Reweight Very Large Conformational Ensembles Using Average Experimental Data and To Determine Their Relative Information Content Hoi Tik Alvin Leung, Olivier Bignucolo, Regula Aregger, § Sonja A. Dames, , Adam Mazur, Simon Berne ̀ che, and Stephan Grzesiek* , Focal Area Structural Biology and Biophysics, Biozentrum, SIB Swiss Institute of Bioinformatics, University of Basel, CH-4056 Basel, Switzerland § Institut fü r Biochemie, University of Leipzig, D-04103 Leipzig, Germany Department of Chemistry, Technische Universitä t Mü nchen, D-85748 Garching, Germany Institute of Structural Biology, Helmholtz Zentrum Mü nchen, D-85764 Neuherberg, Germany * S Supporting Information ABSTRACT: Flexible polypeptides such as unfolded proteins may access an astronomical number of conformations. The most advanced simulations of such states usually comprise tens of thousands of individual structures. In principle, a comparison of parameters predicted from such ensembles to experimental data provides a measure of their quality. In practice, analyses that go beyond the comparison of unbiased average data have been impossible to carry out on the entirety of such very large ensembles and have, therefore, been restricted to much smaller subensembles and/or nondeterministic algorithms. Here, we show that such very large ensembles, on the order of 10 4 to 10 5 conformations, can be analyzed in full by a maximum entropy t to experimental average data. Maximizing the entropy of the population weights of individual conformations under experimen- tal χ 2 constraints is a convex optimization problem, which can be solved in a very ecient and robust manner to a unique global solution even for very large ensembles. Since the population weights can be determined reliably, the reweighted full ensemble presents the best model of the combined information from simulation and experiment. Furthermore, since the reduction of entropy due to the experimental constraints is well-dened, its value provides a robust measure of the information content of the experimental data relative to the simulated ensemble and an indication for the density of the sampling of conformational space. The method is applied to the reweighting of a 35 000 frame molecular dynamics trajectory of the nonapeptide EGAAWAASS by extensive NMR 3 J coupling and RDC data. The analysis shows that RDCs provide signicantly more information than 3 J couplings and that a discontinuity in the RDC pattern at the central tryptophan is caused by a cluster of helical conformations. Reweighting factors are moderate and consistent with errors in MD force elds of less than 3kT. The required reweighting is larger for an ensemble derived from a statistical coil model, consistent with its coarser nature. We call the method COPER, for convex optimization for ensemble reweighting. Similar advantages of large-scale eciency and robustness can be obtained for other ensemble analysis methods with convex targets and constraints, such as constrained χ 2 minimization and the maximum occurrence method. INTRODUCTION Proteins exist as ensembles of interchanging conformations. Obviously, unfolded polypeptide chains, such as chemically or physically denatured proteins and intrinsically disordered proteins (IDPs), can access an extremely large number of conformations. 1 A comprehensive description of their structural preferences is a prerequisite for understanding protein folding and the function of IDPs in health and disease. 2 However, native, folded proteins also usually adopt many conformations close to the global free energy minimum, 3 and their interchange is a hallmark of protein function, such as catalysis 4 or signal transduction. 5 A detailed experimental determination of individual structures in such protein ensembles becomes impossible as soon as their number exceeds a few, since the number of conformational degrees of freedom quickly outpaces the number of measurable parameters. 6 To make progress, Received: August 7, 2015 Published: November 2, 2015 Article pubs.acs.org/JCTC © 2015 American Chemical Society 383 DOI: 10.1021/acs.jctc.5b00759 J. Chem. Theory Comput. 2016, 12, 383394