1 Simple jury predicts protein secondary structure best Burkhard Rost 1,* , Pierre Baldi 2 , Geoff Barton 3 , James Cuff 4 , Volker Eyrich 5,1 , David Jones 6 , Kevin Karplus 7 , Ross King 8 , Gianluca Pollastri 2 , Dariusz Przybylski 1 . . Affiliations 1 CUBIC, Columbia Univ., Dept. of Biochemistry and Molecular Biophysics, 650 West 168 th Street, New York, NY 10032, USA 2 Univ. of California, Irvine, Dept. of Information and Computer Science, Institute of Genomics and Bioinformatics, Irvine, CA- 92697, USA 3 European Bioinformatics Institute, Genome Campus, Hinxton, Cambs CB10 1SD, England 4 The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA, England 5 Columbia Univ., Dept. of Chemistry, 3000 Broadway MC 3167, New York, NY 10027, USA 6 Dept. of Biological Sciences, Brunel Univ., 274348, Uxbridge, Middlesex UB8 3PH, England 7 Computer Engineering, Univ. of California, Santa Cruz, Santa Cruz, CA 95064, USA 8 The Univ. of Wales Aberystwyth, Dept. of Computer Science, Penglais, Ceredigion, SY23 3DB, Wales, UK * Corresponding author: rost@columbia.edu, http://cubic.bioc.columbia.edu/ Abstract The field of secondary structure prediction methods has advanced again. The best methods now reach levels of 74-76% of the residues correctly predicted in one of the three states helix, strand, or other. In context of the EVA/CASP, we experimented with averaging over the best current methods. The resulting jury decision proved significantly more accurate than the best method. Although the 'jury' seemed the best choice on average, for 60% of all proteins one method was better than the jury. Furthermore, the best individual methods tended to be superior to the jury in estimating the reliability of a prediction. Hence, averaging over predictions may be the method of choice for a quick scan of large data set, while experts may profit from studying the respective method in detail. INTRODUCTION Secondary structure prediction improved again. Accuracy increased substantially in the 90's through using evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height around 76% of all residues predicted correctly in one of the three states helix, strand, other. The best current methods solved most of the problems raised at earlier CASP meetings: All good methods now get segments right and perform well on strands. Could we improve prediction accuracy further by averaging over the best prediction methods? METHODS JPred2 1 : Jpred2 provides multiple sequence alignments constructed automatically by both PSIBLAST searches and ClustalW alignments derived from a carefully filtered non-redundant sequence database. The default in JPred2 is to execute only the JNet 1 algorithm. For the results in this paper (Table 1), we used JPred2 with the option to combine JNet with PHD 2, 3 , NNSSP 4 , Predator 5 , Mulpred (Geoff Barton, unpublished), DSC 6 and Zpred 7 . We estimated that the combination of JNet with these methods improved prediction accuracy slightly. PHDpsi 8 : PHD 9 is a system of neural networks using evolutionary information as input. It was developed in 1994. The only difference between PHD and PHDpsi is that the latter uses divergent profiles as provided by PSI- BLAST 10 . Prof_king 11, 12 : The input to Prof_king is a multiple sequence alignment produced using PSI-BLAST 10 and ClustalW 13 . The alignment is "poly-transformed" into attributes using the approaches of: GOR propensities 14 , PHD profiles 2 , and PSI-PRED profiles 15 . The machine learning in Prof_king is a complicated combination of linear and quadratic discrimination, back-propagation neural networks, the use of different priors, and cascaded