Prediction of Secondary Structures of Proteins Using a Two-Stage Method Metin Turkay and Ozlem Yilmaz and Fadime Uney Yuksektepe College of Engineering, Koç University, Rumelifeneri Yolu, Sarıyer, 34450 İstanbul, TURKEY Abstract Protein structure determination and prediction has been a focal research subject in life sciences due to the importance of protein structure in understanding the biological and chemical activities in any organism. The experimental methods used to determine the structures of proteins demand sophisticated equipment and time. In order to overcome the shortcomings of the experimental methods, a host of algorithms aimed at predicting the location of secondary structure elements using statistical and computational methods are developed. However, prediction accuracies of these methods rarely exceeded 70%. In this paper a novel two-stage method to predict the location of secondary structure elements in a protein using the primary structure data only is presented. In the first stage of the proposed method, folding type of a protein is determined using a novel classification model for multi-class problems. The second stage of the method utilizes data available in the Protein Data Bank and determines the possible location of secondary structure elements in a probabilistic search algorithm. It is shown that the average accuracy of the predictions increased to 74.1%. Keywords: Protein Structure, Data Classification, Mixed-Integer Linear Programming 1. Introduction Proteins are large molecules indispensable for existence and proper functioning of biological organisms. Proteins are used in structure of cells, which are main constituents of larger formations like tissues and organs. Bones, muscles, skin and hair of organisms are made basically up of proteins. Besides their necessity for structure, they are also required for proper functioning and regulation of organisms such as enzymes, hormones, antibodies. Understanding functions of proteins is crucial for discovery of drugs to treat various diseases and disorders. A protein molecule is the chain(s) of amino acids also called residues. A typical protein contains 200 – 300 amino acids but this may increase up to approximately 30,000 in a single chain. There are 4 basic structural phases in proteins: primary structure, secondary structure, tertiary structure and quaternary structure. The primary structure is the sequence of amino acids that make up the protein. The secondary structure of a segment of polypeptide chain is the local spatial arrangement of its main-chain atoms without regard to the conformation of its side chains or to its relationship with other segments. This is the shape formed by amino acid sequences due to interactions between different parts of molecules. There are mainly three types of secondary structural shapes: α-helices, β-sheets and other structures connecting these such as loops, turns or coils. Alpha-helices are spiral strings formed by hydrogen bonds between CO and NH groups in residues Beta-sheets are plain strands formed by stretched polypeptide backbone. Connecting structures do not have regular shapes; they connect α-helices and β-sheets to each other. Turns enable parts of polypeptide chain to and 9th International Symposium on Process Systems Engineering W. Marquardt, C. Pantelides (Editors) © 2006 Published by Elsevier B.V. 16th European Symposium on Computer Aided Process Engineering 1679