Protein Folding Force Fields: Optimisation and Optimism Mark Abraham, James B. Procter, Thomas Huber, Zsuzsanna Dosztanyi and Andrew E. Torda Research School of Chemistry, Australian National University, Canberra ACT 0200, Australia, Andrew.Torda@anu.edu.au Abstract - Force fields represent the interactions between the atoms or sites within a molecule of interest. Normally, they are a model for the physical interactions within the system one is studying. If, however, one is willing to forget physics, one may be able to pervert force field methodology for some other purpose. One can use quite arbitrary functional forms without an obvious physical basis. This work shows how the parameters can then be optimised so as to allow protein sequences to computationally find suitable protein structures. Mathematically, the interaction functions are force fields, but in practice they are a form of sequence- structure compatibility function. INTRODUCTION Protein structure prediction has been a popular activity for literally decades. Twenty years ago, X-ray crystallography was slow and expensive and no structures had been determined by nuclear magnetic resonance spectroscopy (NMR). Brave theoreticians attempted to predict protein structures from sequence information, hoping to bypass experiment. These days, crystallography has become faster and there are many NMR structures known, but genome scale sequencing has meant that the rate at which protein sequence data accumulates far outstrips the rate of structure determination. If a protein sequence is similar to one of known structure, comparative modelling will produce reliable answers. If there is no structure with clear sequence similarity, one may resort to energy based calculations. Underlying most calculations is the assumption that proteins spend most of their time in regions of conformational space of lowest free energy. If this is true, then one should be able to model the potential energy of the system, add in entropic contributions and solve the problem by searching available configurations. In practice, the problem has completely resisted computational attacks. Furthermore, it is not even clear just how close the state of the art is. On the one side, there is the problem of searching through available configurations. It is hard to overestimate the size of this task. Considering only a protein's backbone, one could make the rash oversimplification of saying that there are three or four likely conformations. This means that for an N residue protein, there are far more than 3 N possibilities one would have to consider. In practice, conformational space is continuous, there are more than three conformations per residue and we have not even considered sidechain conformations. One the other side, there is the model for potential energy. In molecular mechanics, this model is atomistic (one interaction site per atom). This is certainly useful for detecting energy changes due to small shifts of coordinates, but may not be best for comparing very different but plausible conformations. Here, we describe approaches to both of these problems. From the point of view of searching, one could divide approaches into two classes. Either one wants to model nature and the folding process, or one simply wishes to find the best answer. Consider Fig. 1. non-physical A B D A … physical A D A B Fig. 1. Physical and non physical approaches to predicting structure. One could try to model the natural folding pathway as shown in the top panel and proteins have been subjected to methods ranging from molecular dynamics simulations to Monte Carlo calculations. If these found the correct answer, they would have the bonus of producing statistical mechanically valid (Boltzmann) distributions and incorporate entropy in a totally natural manner. Because the search problem is so difficult,