Distinguish Protein Decoys by Using a Scoring Function Based on a New AMBER Force Field, Short Molecular Dynamics Simulations, and the Generalized Born Solvent Model Mathew C. Lee and Yong Duan * Department of Chemistry and Biochemistry and Center of Biomedical Research Excellence in Structural and Functional Genomics, University of Delaware, Newark, Delaware ABSTRACT Recent works have shown the abil- ity of physics-based potentials (e.g., CHARMM and OPLS-AA) and energy minimization to differentiate the native protein structures from large ensemble of non-native structures. In this study, we extended previous work by other authors and developed an energy scoring function using a new set of AMBER parameters (also recently developed in our labora- tory) in conjunction with molecular dynamics and the Generalized Born solvent model. We evaluated the performance of our new scoring function by examining its ability to distinguish between the native and decoy protein structures. Here we present a systematic comparison of our results with those obtained with use of other physics-based potentials by previous authors. A total of 7 decoy sets, 117 protein sequences, and more than 41,000 structures were evaluated. The results of our study showed that our new scoring function represents a significant improvement over previously published physics- based scoring functions. Proteins 2004;55:620 – 634. © 2004 Wiley-Liss, Inc. Key words: computational structure prediction; de- liberately misfolded proteins; potential energy function; z scores; protein fold- ing INTRODUCTION Methods of computational protein structure prediction are generally rooted in the thermodynamic hypothesis that the native-state conformation is the most stable conformation and, therefore, must occupy the lowest ener- getic state. 1 Although there was one recent example in which the native state of the -lytic protein appeared to be less stable than its molten-globular intermediate and denatured states, closer examination revealed that the enthalpy of the native state is actually lower than the misfolded state by about 18 kcal/mol and that the apparent stability of the misfolded state is due to the increase in conformational entropy on unfolding. 2,3 This observation at first glance seems to nullify the underlying assumption of most structure prediction methods, but with proper interpretation, it actually supports the case of thermody- namic hypothesis. Even in this extreme example in which the native state is kinetically trapped, the effective free energy of the native state (the free energy of the protein plus solvent at a fix conformation 4,5 ) remains the lowest. However, the ruggedness of the energy landscape also dictates that an ensemble of local minimum energy states around the native state exists. Therefore, effective energy functions that can accurately depict the energy landscape of protein conformation space are a common requirement for all computational approaches to the prediction prob- lem. This discriminatory requirement was formulated as the “principle of minimum frustration” by Bryngelson and Wolynes. 6 Three major classes of prediction methods are in use today: homology modeling, threading/fold recognition, and ab initio folding. Regardless of which class a prediction method belongs to, an effective energy function is usually required. These functions are typically used in one of two ways: they are either used as optimization criteria to drive conformational search algorithms to sift through the con- formational space (folding problem) or they are used as selection criteria to select a conformation from a set of possible structures in fold recognition applications (re- verse folding problem). Although the exact design of an effective energy function depends on the type of problem one wants to tackle, several factors constrain the final form that an energy function ultimately assumes. 7 For example, in fold recognition applications, one is primarily concerned with the backbone geometry of a protein; thus, one is afforded a greater freedom in simplifying the representation of the side-chains. In homology modeling or molecular dynamics-based ab initio folding in which atomic details are required, reduced representation may not be sufficient; molecular mechanics force field-based energy functions that account for full atomic details are thought to be better suited for such applications. Depending on the method from which an energy function is derived, it is classified as one of three types: knowledge or statistics- *Correspondence to: Yong Duan, Department of Chemistry and Biochemistry and Center of Biomedical Research Excellence in Struc- tural and Functional Genomics, University of Delaware, Newark, DE 19716. E-mail: yduan@udel.edu Received 24 January 2003; Accepted 14 March 2003 Published online 1 April 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.10470 PROTEINS: Structure, Function, and Bioinformatics 55:620 – 634 (2004) © 2004 WILEY-LISS, INC.