978-1-5090-0139-2/15/$31.00 ©2015 IEEE
A Heuristic Method to Bias Protein's Primary
Sequence in Protein Structure Prediction
Nasser Mozayani
Dept. of Computer Science,
Iran University of Science and Technology
Tehran, Iran
mozayani@iust.ac.ir
Hossein Parineh
Dept. of Computer Science
Iran University of Science and Technology
Tehran, Iran
Hossein.Parineh@gmail.com
Abstract— Protein Structure Prediction (PSP) is one of the
most studied topics in the field of bioinformatics. Regarding the
intrinsic hardness of the problem, during last decades several
computational methods mainly based on artificial intelligence
have been proposed to approach the problem. In this paper we
broke the main process of PSP into two steps. The first step is
making a bias in the sequence, i.e. providing a very fast yet
considerably better energy of conformation compared to the
primary sequence with zero energy. The second step, which is
studied in the other essay, is feeding this biased sequence to
another algorithm to find the best possible conformation. For the
first step, we developed a new heuristic method to find a low-
energy structure of a protein. The main concept of this method is
based on rule extraction from previously determined
conformations. We'll call this method Fast-Bias-Algorithm
(FBA) mainly because it provides a modified structure with
better energy from a primary (linear) structure of a protein in a
remarkably short time, comparing to the time needed for the
whole process. This method was implemented in Netlogo. We
have tested this algorithm on several benchmark sequences
ranging from 20 to 50-mers in two dimensional Hydrophobic
Hydrophilic lattice models. Comparing with the result of the
other algorithms, our method in less than 2% of their time
reached up to 62% of the energy of their best conformation.
Keywords— Computational Biology ; Protein Structure
Prediction; Heuristic method;Fast Bias Algorithm; HP Model ; 2D
lattice ; Netlogo
I. INTRODUCTION
Proteins are the building block and functional molecules in
human body that play a key role in almost every biological
process. Almost all cellular processes on earth are governed or
guided by proteins [1]. Progress in handling PSP problem will
also improve our understanding of proteins involved in vital
processes, including diseases such as cancer [16]. About 39
million non-redundant protein sequences are extracted from
DNA sequences and are available at GenBank [14]. Protein 3-
D structures which are approximately about 100000, in turn,
may be obtained from the Protein Data Bank or PDB [15].
Consequently, there is a huge gap between our capacities to
produce protein sequences and to determine 3-D structures of
new proteins with yet unknown folds.
In this work we concentrate on ab-initio modeling. Ab-
initio methods are based on the Anfinsen thermodynamic
hypothesis [2]: the (native) conformation adopted by a protein
is the most stable one, i.e. the one with minimum free energy.
In its native conformation, protein structure is
thermodynamically stable, which shows that it complies with
Gibbs second law of lowest free energy (second law of
thermodynamic) [4].
All-atom computer simulations are typically unpractical,
because of the huge amounts of computations that is needed by
this process. Even simple abstractions are NP-complete [5]. To
overcome the restrictions imposed by computational
complexity, several simplified models such as: AB, HP, BLN
and Tube model have been proposed.
Algorithm effectiveness is evaluated by the measure of
final energy of the predicted structure. Regarding the fact that
there are several different models, unfortunately, there is no
general agreement on the potential function that should be
used with these models, and several different energy functions
can be found in literature.
II. BACKGROUND
A. Proteins
Proteins are polymers made up of chains of amino acids.
There are only 20 different types of amino acids used in protein
structure, so proteins could be represented by a string of
characters for computational purposes. The Primary or Linear
structure of a protein is a sequence consisting of amino
acids ݏ
ଵ
ǡǥǡ ݏ
. The secondary structure is local folding of a
sequence which is usually in the form of Į-helix, ȕ-sheet or
coil. The tertiary structure is then, the composition of these
secondary structures which shapes a protein in 3-dimentional
space. Quaternary structure is the composition of tertiary
structures. The 3-dimensional shape is called as Functional,
Native or biological fold/ Conformation [2], [3]. Based on
Anfinsen theory [2] Protein in its native conformation is at the
most stable state, and has the least free energy. A review of
various forces and potentials can be found in [17]. The PSP
Problem is about predicting the native conformation of a
protein, when its sequence of amino acids is known [18] [19].
SPIS2015, 16-17 Dec. 2015, Amirkabir University of Technology, Tehran, IRAN
37