Prediction of β-turns and β-turn types by a novel bidirectional Elman-type recurrent
neural network with multiple output layers (MOLEBRNN)
Andreas Kirschner ⁎, Dmitrij Frishman
Department of Genome Oriented Bioinformatics, Technische Universität München, Weihenstephan, Germany
ABSTRACT ARTICLE INFO
Article history:
Available online 10 June 2008
Keywords:
Structural bioinformatics
Protein structure prediction
Machine learning
Classification
Prediction of β-turns from amino acid sequences has long been recognized as an important problem in structural
bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because
various structural features of proteins are intercorrelated, secondary structure information has been often
employed as an additional input for machine learning algorithms while predicting β-turns. Here we present a
novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of
predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three
aspects of protein structure: β-turns, β-turn types, and secondary structure. The advantage of our method
compared to other predictors is that it does not require any external input except for sequence profiles because
interdependencies between different structural features are taken into account implicitly during the learning
process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total
prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported
so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how
simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN
presented here is a generic method applicable in a variety of research fields where multiple mutually depending
target classes need to be predicted. Availability: http://webclu.bio.wzw.tum.de/predator-web/.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
β-turns are defined as reversals in direction of the polypeptide chain
consisting of four consecutive amino acid residues, with the first and the
last residue situated in close proximity to each other and the two central
residues not being part of an α-helix (Venkatachalam, 1968). β-turns are
classified into nine different types based on the dihedral angles of their
two central residues (Hutchinson and Thornton, 1994). Local interac-
tions in β-turns play an important role in initiating protein folding and
stabilizing protein structure (Zimmerman and Scheraga, 1977). Approxi-
mately every fourth amino acid residue in globular proteins is found in a
β-turn (Kabsch and Sander, 1983), and most of the β-turns are located on
the protein surface (Rose et al., 1985) where they are often involved in
intra-molecular binding, cleavage, and posttranslational modification
events. In particular, the role of β-turns in antigen recognition and
antibody binding has been documented (see for example Hinds et al.,
1991; Rini et al., 1993). The binding specificity and sensitivity may
depend on a particular turn subtype (Bach et al., 1996; Li et al., 1999).
The evolution of β-turn prediction methods closely followed the
developments in the area of protein secondary structure prediction. Early
approaches (Chou and Fasman, 1979, Hutchinson and Thornton, 1994)
relied on β-turn type dependent position specific potentials for each
residue derived from known three-dimensional structures of proteins. In
particular it was found that β-turns tended to be enriched in hydrophilic
residues owing to their frequent solvent exposure (Rose, 1978). Zhang
and Chou (1997) extended this simple approach by considering residue
correlations between positions 1–4 and 2–3 in the turn tetra-peptides.
Fuchs and Alix (2005) additionally weight β-turn propensities according
to evolutionary conservation of respective residue positions.
The second major group of methods is based on machine intelligence
algorithms that learn the mapping from the amino acid sequence to the
residue β-turn propensity when trained on a database of known
conformations. Neural networks have been widely used for this purpose,
starting with the work of McGregor et al. (1989). Shepherd et al. (1999)
applied a two-layer neural network architecture, with predicted
secondary structure included as additional information at the second
stage. The β-turn prediction accuracy was further boosted by utilizing
PSI-BLAST derived position specific scoring matrices rather than single
sequences as input for neural networks (Kaur and Raghava, 2003) and
the k-nearest neighbor algorithm (Kim, 2004). More recently, Support
Vector Machines (SVM) have become popular for sequence-based
Gene 422 (2008) 22–29
Abbreviations: AUC, Area Under Curve; BRNN, Bidirectional Recurrent Neural
Network; EBRNN, Elman-type Bidirectional Recurrent Neural Network; MCC, Matthews
correlation coefficient; MOLEBRNN, Multi Output Layer Elman-type Bidirectional
Recurrent Neural Network; PSSM, Position Specific Scoring Matrix; RNN, Recurrent
Neural Network; ROC, Receiver Operator Curve; SVM, Support Vector Machine.
⁎ Corresponding author. Department of Genome-oriented Bioinformatics, Am Forum
1, 85354 Freising-Weihenstephan, Germany. Tel.: +49 8161 712139.
E-mail address: a.kirschner@wzw.tum.de (A. Kirschner).
0378-1119/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.gene.2008.06.008
Contents lists available at ScienceDirect
Gene
journal homepage: www.elsevier.com/locate/gene