978-1-5090-0139-2/15/$31.00 ©2015 IEEE A Heuristic Method to Bias Protein's Primary Sequence in Protein Structure Prediction Nasser Mozayani Dept. of Computer Science, Iran University of Science and Technology Tehran, Iran mozayani@iust.ac.ir Hossein Parineh Dept. of Computer Science Iran University of Science and Technology Tehran, Iran Hossein.Parineh@gmail.com Abstract— Protein Structure Prediction (PSP) is one of the most studied topics in the field of bioinformatics. Regarding the intrinsic hardness of the problem, during last decades several computational methods mainly based on artificial intelligence have been proposed to approach the problem. In this paper we broke the main process of PSP into two steps. The first step is making a bias in the sequence, i.e. providing a very fast yet considerably better energy of conformation compared to the primary sequence with zero energy. The second step, which is studied in the other essay, is feeding this biased sequence to another algorithm to find the best possible conformation. For the first step, we developed a new heuristic method to find a low- energy structure of a protein. The main concept of this method is based on rule extraction from previously determined conformations. We'll call this method Fast-Bias-Algorithm (FBA) mainly because it provides a modified structure with better energy from a primary (linear) structure of a protein in a remarkably short time, comparing to the time needed for the whole process. This method was implemented in Netlogo. We have tested this algorithm on several benchmark sequences ranging from 20 to 50-mers in two dimensional Hydrophobic Hydrophilic lattice models. Comparing with the result of the other algorithms, our method in less than 2% of their time reached up to 62% of the energy of their best conformation. Keywords— Computational Biology ; Protein Structure Prediction; Heuristic method;Fast Bias Algorithm; HP Model ; 2D lattice ; Netlogo I. INTRODUCTION Proteins are the building block and functional molecules in human body that play a key role in almost every biological process. Almost all cellular processes on earth are governed or guided by proteins [1]. Progress in handling PSP problem will also improve our understanding of proteins involved in vital processes, including diseases such as cancer [16]. About 39 million non-redundant protein sequences are extracted from DNA sequences and are available at GenBank [14]. Protein 3- D structures which are approximately about 100000, in turn, may be obtained from the Protein Data Bank or PDB [15]. Consequently, there is a huge gap between our capacities to produce protein sequences and to determine 3-D structures of new proteins with yet unknown folds. In this work we concentrate on ab-initio modeling. Ab- initio methods are based on the Anfinsen thermodynamic hypothesis [2]: the (native) conformation adopted by a protein is the most stable one, i.e. the one with minimum free energy. In its native conformation, protein structure is thermodynamically stable, which shows that it complies with Gibbs second law of lowest free energy (second law of thermodynamic) [4]. All-atom computer simulations are typically unpractical, because of the huge amounts of computations that is needed by this process. Even simple abstractions are NP-complete [5]. To overcome the restrictions imposed by computational complexity, several simplified models such as: AB, HP, BLN and Tube model have been proposed. Algorithm effectiveness is evaluated by the measure of final energy of the predicted structure. Regarding the fact that there are several different models, unfortunately, there is no general agreement on the potential function that should be used with these models, and several different energy functions can be found in literature. II. BACKGROUND A. Proteins Proteins are polymers made up of chains of amino acids. There are only 20 different types of amino acids used in protein structure, so proteins could be represented by a string of characters for computational purposes. The Primary or Linear structure of a protein is a sequence consisting of amino acids ݏ ǡǥǡ ݏ . The secondary structure is local folding of a sequence which is usually in the form of Į-helix, ȕ-sheet or coil. The tertiary structure is then, the composition of these secondary structures which shapes a protein in 3-dimentional space. Quaternary structure is the composition of tertiary structures. The 3-dimensional shape is called as Functional, Native or biological fold/ Conformation [2], [3]. Based on Anfinsen theory [2] Protein in its native conformation is at the most stable state, and has the least free energy. A review of various forces and potentials can be found in [17]. The PSP Problem is about predicting the native conformation of a protein, when its sequence of amino acids is known [18] [19]. SPIS2015, 16-17 Dec. 2015, Amirkabir University of Technology, Tehran, IRAN 37