Prediction of Secondary Structures of Proteins
Using a Two-Stage Method
Metin Turkay and Ozlem Yilmaz and Fadime Uney Yuksektepe
College of Engineering, Koç University, Rumelifeneri Yolu, Sarıyer, 34450 İstanbul,
TURKEY
Abstract
Protein structure determination and prediction has been a focal research subject in life
sciences due to the importance of protein structure in understanding the biological and
chemical activities in any organism. The experimental methods used to determine the
structures of proteins demand sophisticated equipment and time. In order to overcome
the shortcomings of the experimental methods, a host of algorithms aimed at predicting
the location of secondary structure elements using statistical and computational methods
are developed. However, prediction accuracies of these methods rarely exceeded 70%.
In this paper a novel two-stage method to predict the location of secondary structure
elements in a protein using the primary structure data only is presented. In the first
stage of the proposed method, folding type of a protein is determined using a novel
classification model for multi-class problems. The second stage of the method utilizes
data available in the Protein Data Bank and determines the possible location of
secondary structure elements in a probabilistic search algorithm. It is shown that the
average accuracy of the predictions increased to 74.1%.
Keywords: Protein Structure, Data Classification, Mixed-Integer Linear Programming
1. Introduction
Proteins are large molecules indispensable for existence and proper functioning of
biological organisms. Proteins are used in structure of cells, which are main constituents
of larger formations like tissues and organs. Bones, muscles, skin and hair of organisms
are made basically up of proteins. Besides their necessity for structure, they are also
required for proper functioning and regulation of organisms such as enzymes,
hormones, antibodies. Understanding functions of proteins is crucial for discovery of
drugs to treat various diseases and disorders.
A protein molecule is the chain(s) of amino acids also called residues. A typical protein
contains 200 – 300 amino acids but this may increase up to approximately 30,000 in a
single chain. There are 4 basic structural phases in proteins: primary structure,
secondary structure, tertiary structure and quaternary structure. The primary structure is
the sequence of amino acids that make up the protein. The secondary structure of a
segment of polypeptide chain is the local spatial arrangement of its main-chain atoms
without regard to the conformation of its side chains or to its relationship with other
segments. This is the shape formed by amino acid sequences due to interactions
between different parts of molecules. There are mainly three types of secondary
structural shapes: α-helices, β-sheets and other structures connecting these such as
loops, turns or coils. Alpha-helices are spiral strings formed by hydrogen bonds
between CO and NH groups in residues Beta-sheets are plain strands formed by
stretched polypeptide backbone. Connecting structures do not have regular shapes; they
connect α-helices and β-sheets to each other. Turns enable parts of polypeptide chain to
and 9th International Symposium on Process Systems Engineering
W. Marquardt, C. Pantelides (Editors)
© 2006 Published by Elsevier B.V.
16th European Symposium on Computer Aided Process Engineering
1679