HUMAN MUTATION 29(6), 852^860, 2008 RESEARCH ARTICLE Accurate Classification of MLH1/MSH2 Missense Variants With Multivariate Analysis of Protein Polymorphisms–Mismatch Repair (MAPP-MMR) Elizabeth C. Chao, 1 Jonathan L. Velasquez, 1 Mavee S.L. Witherspoon, 1 Laura S. Rozek, 2 David Peel, 1 Pauline Ng, 3 Stephen B. Gruber, 2 Patrice Watson, 4 Gad Rennert, 5,6 Hoda Anton-Culver, 1 Henry Lynch, 4 and Steven M. Lipkin 1Ã 1 Genetic Epidemiology Research Institute, University of California, Irvine, Irvine, California; 2 Department of Internal Medicine, Epidemiology, and Human Genetics, University of Michigan, Ann Arbor, Michigan; 3 J. Craig Venter Institute for Human Genetics, Rockville, Maryland; 4 Hereditary Cancer Institute, Creighton University School of Medicine, Omaha, Nebraska; 5 Department of Community Medicine and Epidemiology, Carmel Medical Center and Technion Faculty of Medicine, Haifa, Israel; 6 CHS National Cancer Control Center, Haifa, Israel Communicated by Marc Greenblatt Lynch syndrome, also known as hereditary nonpolyposis colon cancer (HNPCC), is the most common known genetic syndrome for colorectal cancer (CRC). MLH1/MSH2 mutations underlie approximately 90% of Lynch syndrome families. A total of 24% of these mutations are missense. Interpreting missense variation is extremely challenging. We have therefore developed multivariate analysis of protein polymorphisms–mismatch repair (MAPP-MMR), a bioinformatic algorithm that effectively classifies MLH1/MSH2 deleterious and neutral missense variants. We compiled a large database (n4300) of MLH1/MSH2 missense variants with associated clinical and molecular characteristics. We divided this database into nonoverlapping training and validation sets and tested MAPP-MMR. MAPP-MMR significantly outperformed other missense variant classification algorithms (sensitivity, 94%; specificity, 96%; positive predictive value [PPV] 98%; negative predictive value [NPV], 89%), such as SIFTand PolyPhen. MAPP-MMR is an effective bioinformatic tool for missense variant interpretation that accurately distinguishes MLH1/MSH2 deleterious variants from neutral variants. Hum Mutat 29(6), 852–860, 2008. r r 2008 Wiley-Liss, Inc. KEY WORDS: Lynch syndrome; colorectal cancer; variants of uncertain significance; cancer genetics; HNPCC; MLH1; MSH INTRODUCTION MLH1 (MIM] 120436) and MSH2 (MIM] 609309) mutations underlie 90% of Lynch syndrome [Lynch and de la Chapelle, 2003] or hereditary nonpolyposis colon cancer (HNPCC). The clinical phenotype of MLH1/MSH2 mutations ranges from Bethesda guidelines (BG) [Rodriguez-Bigas et al., 1997; Umar et al., 2004] to familial colorectal cancer (CRC) (defined as proband plus either one affected first-degree relative or two affected second-degree relatives), and even sporadic CRC [Peltomaki et al., 2004]. Part of the variation in clinical phenotypes is attributable to missense mutations. A total of 24% of Lynch syndrome mutations are missense [Peltomaki et al., 2004]. There is widespread consensus among cancer geneticists that distinguishing deleterious from neutral variants is challenging. Supporting evidence that a missense variant is functionally relevant and confers risk includes microsatellite instability (MSI), cosegregation with affected relatives, tumor immunohis- tochemistry (IHC), biochemical analyses, and case–control studies. Many missense variants have minimal or conflicting supporting evidence (termed variants of uncertain significance [VUS]). SIFT (http://blocks.fhcrc.org/sift/SIFT.html) and PolyPhen (http:// coot.embl.de/PolyPhen) are bioinformatic algorithms that use evolu- tionary history and physicochemical parameters to interpret missense variants. They are 60 to 80% accurate in test studies [Chan et al., 2007; Chasman and Adams, 2001; Ng and Henikoff, 2001; Raevaara et al., 2005; Sunyaev et al., 2000b, 2001; Xi et al., 2004]. A recent study directly compared these, and other, algorithms, to interpret missense variants in several proteins, including MLH1 and MSH2 [Chan et al., 2007]. This study confirmed a high predictive value for algorithms that use evolutionary sequence conservation, with or without considering protein structural change, to predict the clinical consequences of missense variants [Chan et al., 2007]. A newer algorithm, multivariate analysis of protein polymorphisms (MAPP) [Stone and Sidow, 2005], utilizes a similar approach and is more Published online 27 March 2008 in Wiley InterScience (www. interscience.wiley.com). DOI 10.1002/humu.20735 The Supplementary Material referred to in this article can be accessed at http://www.interscience.wiley.com/jpages/1059-7794/ suppmat. Received 7 June 2007; accepted revised manuscript 21 December 2007. Ã Correspondence to: Steven M. Lipkin, MD, PhD, Director, Cancer Genetics Clinic, Division of Hematology-Oncology, University of Cali- fornia, Irvine, 204 Sprague Hall, ZC 4038, Irvine, CA 92697. E-mail: slipkin@uci.edu r r 2008 WILEY-LISS, INC.