proteins STRUCTURE O FUNCTION O BIOINFORMATICS PREDICTION REPORT Prediction of global and local quality of CASP8 models by MULTICOM series Jianlin Cheng, 1,2,3 * Zheng Wang, 1 Allison N. Tegge, 2 and Jesse Eickholt 1 1 Computer Science Department, University of Missouri, Columbia, Missouri 2 Informatics Institute, University of Missouri, Columbia, Missouri 3 Bond Life Science Center, University of Missouri, Columbia, Missouri INTRODUCTION In recent years, protein structure prediction has become time efficient and hundreds of alternative models of varying quality levels (accuracies) can be generated in a relatively short time. 1 As a result, Model Quality Assurance Programs (MQAP) are needed to assess, refine, rank, and select the highest quality models. Furthermore, an accurate MQAP can ensure the correct application of a model. 2,3 MQAP methods can be divided into two categories: global model quality predictions and local (residue specific) model quality predictions. Amongst the global quality predictors, most of the methods output relative scores that can be used to discriminate native or near-native structures from decoys and a few methods output absolute scores that directly indicate the similarities between the models and the native structures. 3 The techniques frequently used by QA predictors include clustering (multiple-model) methods 1,4–8 and single-model techniques. 3,7,9–13 Clustering methods assume that models which are highly similar to others have better quality. Single-model techniques make predictions by analyzing various sequence alignment features 14 or structural features. These features include solvent exposure, secondary structure contact probability map, and probability map of b-strand residue pairing. We participated in CASP8 (the eighth Critical Assessment of Techniques for Protein Structure Prediction) quality assessment experiments with the MULTICOM series. The MULTICOM series is a set of predictors incorpo- rating various techniques for quality assessment, such as semi-clustering approaches, single-model machine learning approaches, and meta and hybrid approaches which combine two or more single approaches. The authors state no conflict of interest. Grant sponsors: MU Bioinformatics Consortium, UM research board grant, MU research council grant, NLM fellowship. *Correspondence to: Computer Science Department, University of Missouri, Columbia, Missouri 65211. E-mail: chengji@missouri.edu. Received 11 March 2009; Revised 27 April 2009; Accepted 12 May 2009 Published online 28 May 2009 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.22487 ABSTRACT Evaluating the quality of protein structure models is important for selecting and using models. Here, we describe the MUL- TICOM series of model quality predictors which contains three predictors tested in the CASP8 experiments. We evaluated these predictors on 120 CASP8 targets. The average correlations between pre- dicted and real GDT-TS scores of the two semi-clustering methods (MULTICOM and MULTICOM-CLUSTER) and the one single-model ab initio method (MUL- TICOM-CMFR) are 0.90, 0.89, and 0.74, respectively; and their average difference (or GDT-TS loss) between the global GDT-TS scores of the top-ranked models and the best models are 0.05, 0.06, and 0.07, respectively. The average correlation between predicted and real local quality scores of the semi-clustering methods is above 0.64. Our results show that the novel semi-clustering approach that com- pares a model with top ranked reference models can improve initial quality scores generated by the ab initio method and a simple meta approach. Proteins 2009; 00:000–000. V V C 2009 Wiley-Liss, Inc. Key words: protein structure prediction; protein model quality assessment; model quality assurance program; clustering. V V C 2009 WILEY-LISS, INC. PROTEINS 1