Prediction of β-turns and β-turn types by a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) Andreas Kirschner , Dmitrij Frishman Department of Genome Oriented Bioinformatics, Technische Universität München, Weihenstephan, Germany ABSTRACT ARTICLE INFO Article history: Available online 10 June 2008 Keywords: Structural bioinformatics Protein structure prediction Machine learning Classication Prediction of β-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional signicance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting β-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efciency in recognizing three aspects of protein structure: β-turns, β-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence proles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefcient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets inuences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research elds where multiple mutually depending target classes need to be predicted. Availability: http://webclu.bio.wzw.tum.de/predator-web/. © 2008 Elsevier B.V. All rights reserved. 1. Introduction β-turns are dened as reversals in direction of the polypeptide chain consisting of four consecutive amino acid residues, with the rst and the last residue situated in close proximity to each other and the two central residues not being part of an α-helix (Venkatachalam, 1968). β-turns are classied into nine different types based on the dihedral angles of their two central residues (Hutchinson and Thornton, 1994). Local interac- tions in β-turns play an important role in initiating protein folding and stabilizing protein structure (Zimmerman and Scheraga, 1977). Approxi- mately every fourth amino acid residue in globular proteins is found in a β-turn (Kabsch and Sander, 1983), and most of the β-turns are located on the protein surface (Rose et al., 1985) where they are often involved in intra-molecular binding, cleavage, and posttranslational modication events. In particular, the role of β-turns in antigen recognition and antibody binding has been documented (see for example Hinds et al., 1991; Rini et al., 1993). The binding specicity and sensitivity may depend on a particular turn subtype (Bach et al., 1996; Li et al., 1999). The evolution of β-turn prediction methods closely followed the developments in the area of protein secondary structure prediction. Early approaches (Chou and Fasman, 1979, Hutchinson and Thornton, 1994) relied on β-turn type dependent position specic potentials for each residue derived from known three-dimensional structures of proteins. In particular it was found that β-turns tended to be enriched in hydrophilic residues owing to their frequent solvent exposure (Rose, 1978). Zhang and Chou (1997) extended this simple approach by considering residue correlations between positions 14 and 23 in the turn tetra-peptides. Fuchs and Alix (2005) additionally weight β-turn propensities according to evolutionary conservation of respective residue positions. The second major group of methods is based on machine intelligence algorithms that learn the mapping from the amino acid sequence to the residue β-turn propensity when trained on a database of known conformations. Neural networks have been widely used for this purpose, starting with the work of McGregor et al. (1989). Shepherd et al. (1999) applied a two-layer neural network architecture, with predicted secondary structure included as additional information at the second stage. The β-turn prediction accuracy was further boosted by utilizing PSI-BLAST derived position specic scoring matrices rather than single sequences as input for neural networks (Kaur and Raghava, 2003) and the k-nearest neighbor algorithm (Kim, 2004). More recently, Support Vector Machines (SVM) have become popular for sequence-based Gene 422 (2008) 2229 Abbreviations: AUC, Area Under Curve; BRNN, Bidirectional Recurrent Neural Network; EBRNN, Elman-type Bidirectional Recurrent Neural Network; MCC, Matthews correlation coefcient; MOLEBRNN, Multi Output Layer Elman-type Bidirectional Recurrent Neural Network; PSSM, Position Specic Scoring Matrix; RNN, Recurrent Neural Network; ROC, Receiver Operator Curve; SVM, Support Vector Machine. Corresponding author. Department of Genome-oriented Bioinformatics, Am Forum 1, 85354 Freising-Weihenstephan, Germany. Tel.: +49 8161 712139. E-mail address: a.kirschner@wzw.tum.de (A. Kirschner). 0378-1119/$ see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2008.06.008 Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene