Distinguish Protein Decoys by Using a Scoring Function
Based on a New AMBER Force Field, Short Molecular
Dynamics Simulations, and the Generalized Born
Solvent Model
Mathew C. Lee and Yong Duan
*
Department of Chemistry and Biochemistry and Center of Biomedical Research Excellence in Structural and Functional
Genomics, University of Delaware, Newark, Delaware
ABSTRACT Recent works have shown the abil-
ity of physics-based potentials (e.g., CHARMM and
OPLS-AA) and energy minimization to differentiate
the native protein structures from large ensemble of
non-native structures. In this study, we extended
previous work by other authors and developed an
energy scoring function using a new set of AMBER
parameters (also recently developed in our labora-
tory) in conjunction with molecular dynamics and
the Generalized Born solvent model. We evaluated
the performance of our new scoring function by
examining its ability to distinguish between the
native and decoy protein structures. Here we present
a systematic comparison of our results with those
obtained with use of other physics-based potentials
by previous authors. A total of 7 decoy sets, 117
protein sequences, and more than 41,000 structures
were evaluated. The results of our study showed that
our new scoring function represents a significant
improvement over previously published physics-
based scoring functions. Proteins 2004;55:620 – 634.
© 2004 Wiley-Liss, Inc.
Key words: computational structure prediction; de-
liberately misfolded proteins; potential
energy function; z scores; protein fold-
ing
INTRODUCTION
Methods of computational protein structure prediction
are generally rooted in the thermodynamic hypothesis
that the native-state conformation is the most stable
conformation and, therefore, must occupy the lowest ener-
getic state.
1
Although there was one recent example in
which the native state of the -lytic protein appeared to be
less stable than its molten-globular intermediate and
denatured states, closer examination revealed that the
enthalpy of the native state is actually lower than the
misfolded state by about 18 kcal/mol and that the apparent
stability of the misfolded state is due to the increase in
conformational entropy on unfolding.
2,3
This observation
at first glance seems to nullify the underlying assumption
of most structure prediction methods, but with proper
interpretation, it actually supports the case of thermody-
namic hypothesis. Even in this extreme example in which
the native state is kinetically trapped, the effective free
energy of the native state (the free energy of the protein
plus solvent at a fix conformation
4,5
) remains the lowest.
However, the ruggedness of the energy landscape also
dictates that an ensemble of local minimum energy states
around the native state exists. Therefore, effective energy
functions that can accurately depict the energy landscape
of protein conformation space are a common requirement
for all computational approaches to the prediction prob-
lem. This discriminatory requirement was formulated as
the “principle of minimum frustration” by Bryngelson and
Wolynes.
6
Three major classes of prediction methods are in use
today: homology modeling, threading/fold recognition, and
ab initio folding. Regardless of which class a prediction
method belongs to, an effective energy function is usually
required. These functions are typically used in one of two
ways: they are either used as optimization criteria to drive
conformational search algorithms to sift through the con-
formational space (folding problem) or they are used as
selection criteria to select a conformation from a set of
possible structures in fold recognition applications (re-
verse folding problem). Although the exact design of an
effective energy function depends on the type of problem
one wants to tackle, several factors constrain the final
form that an energy function ultimately assumes.
7
For
example, in fold recognition applications, one is primarily
concerned with the backbone geometry of a protein; thus,
one is afforded a greater freedom in simplifying the
representation of the side-chains. In homology modeling or
molecular dynamics-based ab initio folding in which atomic
details are required, reduced representation may not be
sufficient; molecular mechanics force field-based energy
functions that account for full atomic details are thought to
be better suited for such applications. Depending on the
method from which an energy function is derived, it is
classified as one of three types: knowledge or statistics-
*Correspondence to: Yong Duan, Department of Chemistry and
Biochemistry and Center of Biomedical Research Excellence in Struc-
tural and Functional Genomics, University of Delaware, Newark, DE
19716. E-mail: yduan@udel.edu
Received 24 January 2003; Accepted 14 March 2003
Published online 1 April 2004 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/prot.10470
PROTEINS: Structure, Function, and Bioinformatics 55:620 – 634 (2004)
© 2004 WILEY-LISS, INC.