Applications of evolutionary SVM to prediction of membrane alpha-helices Hassan B. Kazemian ⇑ , Kenneth White, Dominic Palmer-Brown London Metropolitan University, United Kingdom article info Keywords: Alpha-helix transmembrane domain Support vector machine Genetic algorithm abstract This paper is in the area of membrane proteins. Membrane proteins make up about 75% of possible tar- gets for novel drugs discovery. However, membrane proteins are one of the most understudied groups of proteins in biochemical research because of technical difficulties of attaining structural information about transmembrane regions or domains. Structural determination of TM regions is an important prior- ity in pharmaceutical industry, as it paves the way for structure based drug design. This research presents a novel evolutionary support vector machine (SVM) based alpha-helix trans- membrane region prediction algorithm to solve the membrane helices in amino acid sequences. The SVM-genetic algorithm (GA) methodology is based on the optimisation of sliding window size, evolution- ary encoding selection and SVM parameter optimisation. In this research average hydrophobicity and propensity based on skew statistics are used to encode the one letter representation of amino acid sequences datasets. The computer simulation results demonstrate that the proposed SVM-GA methodology performs better than most conventional techniques producing an accuracy of 86.71% for cross-validation and 86.43% for jack-knife for randomly selected proteins containing single and multiple transmembrane regions. Fur- thermore, for the amino acid sequence 3LVG, the proposed SVM-GA produces better alpha-helix region identification than PRED-TMR2, MEMSATSVM/MEMSAT3 and PSIPRED V3.0. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Most membrane proteins are attached to cell membranes via part of the protein’s polypeptide chain, the transmembrane domain that passes through the membrane lipid/fat bilayer. It was recognised long before the advent of genomics that the parts of proteins which contact membranes tend to be composed of fat-loving amino acids, which are thermodynamically stable in the fatty environment of the membrane. Transmembrane proteins account for about 30% of the proteome, and are estimated to represent about 75% of possible targets for novel drugs. However, there are considerable technical and experimental challenges to the elucidation of membrane protein structures and hence these proteins are heavily under-represented in structural databases, compared with soluble, globular proteins. Consequently there is a strong need for algorithms which can predict the occurrence of transmembrane regions in a set of protein sequences, and to pro- vide topological information in individual sequences (Elofsson & von Heijne, 2007). Algorithms predicting transmembrane helices date from the work of Kyte and Doolittle (1982), who proposed a method based on hydrophobicity analysis and profiling for identification of transmembrane segments in protein sequences. The method was markedly improved by including von Heijne (1986) renowned positive-inside rule for transmembrane segments. These two core ideas were eventually exploited by Hirokawa, Boon-Chieng, and Mitaku (1998) and Pasquier and Hamodrakas (1999) in the devel- opment of the SOSUI and PRED-TMR systems respectively. With the advance of computing technologies, a range of artificial intelligence (AI) techniques have been applied to solve the problem of identifying transmembrane regions. One of the most widely used and trusted AI methodologies is Hidden Markov Models (HMM) whose implementation in a number of TM region recognition tool- kits, to date, outperforms most other pattern recognition counter- parts. The methodology has been used in conjunction with physico-chemical properties such as burial propensities and gener- ic membrane helices (Krogh, Larsson, von Heijne, & Sonnhammer, 2001; Tusnady & Simon, 1998), prediction of helical re-entrant regions (Viklund, Granseth, & Elofsson, 2006) and the prediction of beta-barrel outer membrane proteins (Bagos, Liakopoulos, Spyropoulos, & Hamodrakas, 2004; Bigelow, Petrey, Liu, Przybylski, & Rost, 2004; Martelli, Fariselli, Krogh, & Casadio, 2002). Neural networks (NN) are also regarded as one of the major AI techniques that can be effectively utilised in predicting the membrane 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2012.12.049 ⇑ Corresponding author. Address: London Metropolitan University, Faculty of Life Sciences and Computing, Tower Building, Holloway Road, London N7 8DB, United Kingdom. E-mail address: h.kazemian@londonmet.ac.uk (H.B. Kazemian). Expert Systems with Applications 40 (2013) 3412–3420 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa