Ab Initio Prediction of Peptide-MHC Binding Geometry for
Diverse Class I MHC Allotypes
Andrew J. Bordner
1,2
*
and Ruben Abagyan
1
1
Department of Molecular Biology, The Scripps Research Institute, San Diego, California
2
Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee
ABSTRACT Since determining the crystallo-
graphic structure of all peptide-MHC complexes is
infeasible, an accurate prediction of the conforma-
tion is a critical computational problem. These mod-
els can be useful for determining binding energet-
ics, predicting the structures of specific ternary
complexes with T-cell receptors, and designing new
molecules interacting with these complexes. The
main difficulties are (1) adequate sampling of the
large number of conformational degrees of freedom
for the flexible peptide, (2) predicting subtle changes
in the MHC interface geometry upon binding, and
(3) building models for numerous MHC allotypes
without known structures. Whereas previous stud-
ies have approached the sampling problem by divid-
ing the conformational variables into different sets
and predicting them separately, we have refined the
Biased-Probability Monte Carlo docking protocol in
internal coordinates to optimize a physical energy
function for all peptide variables simultaneously.
We also imitated the induced fit by docking into a
more permissive smooth grid representation of the
MHC followed by refinement and reranking using
an all-atom MHC model. Our method was tested by a
comparison of the results of cross-docking 14 pep-
tides into HLA-A*0201 and 9 peptides into H-2K
b
as
well as docking peptides into homology models for
five different HLA allotypes with a comprehensive
set of experimental structures. The surprisingly
accurate prediction (0.75 Å backbone RMSD) for
cross-docking of a highly flexible decapeptide, dis-
similar to the original bound peptide, as well as
docking predictions using homology models for two
allotypes with low average backbone RMSDs of less
than 1.0 Å illustrate the method’s effectiveness.
Finally, energy terms calculated using the pre-
dicted structures were combined with supervised
learning on a large data set to classify peptides as
either HLA-A*0201 binders or nonbinders. In con-
trast with sequence-based prediction methods, this
model was also able to predict the binding affinity
for peptides to a different MHC allotype (H-2K
b
), not
used for training, with comparable prediction accu-
racy. Proteins 2006;63:512–526. © 2006 Wiley-Liss, Inc.
Key words: peptide docking; major histocompatibil-
ity complex (MHC); Monte Carlo optimi-
zation; homology models; potential grid;
peptide binding prediction
INTRODUCTION
The binding of short peptide fragments of endogenous
and foreign proteins to class I major histocompatibility
complex (MHC) glycoproteins is a necessary first step in
the immune surveillance by circulating cytotoxic T-cells.
Peptides resulting from proteosomal processing of cytoso-
lic proteins are transported to the endoplasmic reticulum
by the transporter associated with antigen processing
(TAP) where they bind to newly synthesized MHC mol-
ecules. The resulting complex is then transported to the
cell surface where the MHC is inserted into the membrane.
These complexes are then recognized by CD8
+
T-cells
through peptide and MHC allele specific interactions with
the T-cell receptor (TCR) as well as conserved interactions
with the CD8 coreceptor.
MHC molecules are polymorphic with most variable
residues in the peptide binding pocket so that each allo-
type preferentially binds a distinct subset of peptides.
Since, for example, an individual human can have cells
expressing up to six different allotypes, this diversity
presumably prevents potential antigens from escaping
recognition by the cellular immune system. Also, a particu-
lar MHC allotype can strongly bind a large number of 8 –11
residue peptides. Although most have preferred residue
types in primary or secondary anchor positions, this is
neither necessary nor sufficient for strong binding.
1, 2
This
extreme variability in both components of the peptide-
MHC complex together with the limited number of avail-
able X-ray structures make computational prediction of
the complex an important goal in molecular biology.
Accurate models of peptides bound to MHC are essential
for structure-based prediction of peptide binding affinities.
Position-specific scoring matrices
3–5
and machine learning
methods
6–9
can predict peptide-MHC binding affinity rea-
sonably accurately when a large amount of experimental
The Supplementary Material referred to in this article can be found
at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/
*Correspondence to: Andrew J. Bordner, Computer Science and
Mathematics Division, Oak Ridge National Laboratory, P.O. Box
2008, MS 6173, Oak Ridge, TN 37831. E-mail: bordner@ornl.gov
Grant sponsor: National Institute of Health; Grant number:
1R01GM071872-01. Grant sponsor: U.S. Department of Energy Genom-
ics: GTL and Biopilot grants; Grant number: ORNL is operated under
DOE contract number DE-AC05-00OR22725.
Received 3 May 2005; Revised 12 September 2005; Accepted 11
October 2005
Published online 7 February 2006 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/prot.20831
PROTEINS: Structure, Function, and Bioinformatics 63:512–526 (2006)
© 2006 WILEY-LISS, INC.