Ab Initio Prediction of Peptide-MHC Binding Geometry for Diverse Class I MHC Allotypes Andrew J. Bordner 1,2 * and Ruben Abagyan 1 1 Department of Molecular Biology, The Scripps Research Institute, San Diego, California 2 Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee ABSTRACT Since determining the crystallo- graphic structure of all peptide-MHC complexes is infeasible, an accurate prediction of the conforma- tion is a critical computational problem. These mod- els can be useful for determining binding energet- ics, predicting the structures of specific ternary complexes with T-cell receptors, and designing new molecules interacting with these complexes. The main difficulties are (1) adequate sampling of the large number of conformational degrees of freedom for the flexible peptide, (2) predicting subtle changes in the MHC interface geometry upon binding, and (3) building models for numerous MHC allotypes without known structures. Whereas previous stud- ies have approached the sampling problem by divid- ing the conformational variables into different sets and predicting them separately, we have refined the Biased-Probability Monte Carlo docking protocol in internal coordinates to optimize a physical energy function for all peptide variables simultaneously. We also imitated the induced fit by docking into a more permissive smooth grid representation of the MHC followed by refinement and reranking using an all-atom MHC model. Our method was tested by a comparison of the results of cross-docking 14 pep- tides into HLA-A*0201 and 9 peptides into H-2K b as well as docking peptides into homology models for five different HLA allotypes with a comprehensive set of experimental structures. The surprisingly accurate prediction (0.75 Å backbone RMSD) for cross-docking of a highly flexible decapeptide, dis- similar to the original bound peptide, as well as docking predictions using homology models for two allotypes with low average backbone RMSDs of less than 1.0 Å illustrate the method’s effectiveness. Finally, energy terms calculated using the pre- dicted structures were combined with supervised learning on a large data set to classify peptides as either HLA-A*0201 binders or nonbinders. In con- trast with sequence-based prediction methods, this model was also able to predict the binding affinity for peptides to a different MHC allotype (H-2K b ), not used for training, with comparable prediction accu- racy. Proteins 2006;63:512–526. © 2006 Wiley-Liss, Inc. Key words: peptide docking; major histocompatibil- ity complex (MHC); Monte Carlo optimi- zation; homology models; potential grid; peptide binding prediction INTRODUCTION The binding of short peptide fragments of endogenous and foreign proteins to class I major histocompatibility complex (MHC) glycoproteins is a necessary first step in the immune surveillance by circulating cytotoxic T-cells. Peptides resulting from proteosomal processing of cytoso- lic proteins are transported to the endoplasmic reticulum by the transporter associated with antigen processing (TAP) where they bind to newly synthesized MHC mol- ecules. The resulting complex is then transported to the cell surface where the MHC is inserted into the membrane. These complexes are then recognized by CD8 + T-cells through peptide and MHC allele specific interactions with the T-cell receptor (TCR) as well as conserved interactions with the CD8 coreceptor. MHC molecules are polymorphic with most variable residues in the peptide binding pocket so that each allo- type preferentially binds a distinct subset of peptides. Since, for example, an individual human can have cells expressing up to six different allotypes, this diversity presumably prevents potential antigens from escaping recognition by the cellular immune system. Also, a particu- lar MHC allotype can strongly bind a large number of 8 –11 residue peptides. Although most have preferred residue types in primary or secondary anchor positions, this is neither necessary nor sufficient for strong binding. 1, 2 This extreme variability in both components of the peptide- MHC complex together with the limited number of avail- able X-ray structures make computational prediction of the complex an important goal in molecular biology. Accurate models of peptides bound to MHC are essential for structure-based prediction of peptide binding affinities. Position-specific scoring matrices 3–5 and machine learning methods 6–9 can predict peptide-MHC binding affinity rea- sonably accurately when a large amount of experimental The Supplementary Material referred to in this article can be found at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/ *Correspondence to: Andrew J. Bordner, Computer Science and Mathematics Division, Oak Ridge National Laboratory, P.O. Box 2008, MS 6173, Oak Ridge, TN 37831. E-mail: bordner@ornl.gov Grant sponsor: National Institute of Health; Grant number: 1R01GM071872-01. Grant sponsor: U.S. Department of Energy Genom- ics: GTL and Biopilot grants; Grant number: ORNL is operated under DOE contract number DE-AC05-00OR22725. Received 3 May 2005; Revised 12 September 2005; Accepted 11 October 2005 Published online 7 February 2006 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.20831 PROTEINS: Structure, Function, and Bioinformatics 63:512–526 (2006) © 2006 WILEY-LISS, INC.