International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-9 Issue-2, December, 2019
445
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: B3136129219/2019©BEIESP
DOI: 10.35940/ijeat.B3136.129219
Abstract: Sequence alignment is a significant facet in the
bio-informatics research field for the molecular sequence
analysis. Arrangement of two biological sequences by
maximizing the similarities between the sequences by
incorporating and adjusting gaps is Pairwise Sequence
Alignment (PSA). Arrangement of multiple sequences is
Multiple Sequence Alignment (MSA). Though Dynamic
programming can produce optimal sequence alignment for PSA
it suffers from a problem when multiple optimal paths are
present and trace back is required. Back tracking becomes
complex and it is also not suitable for MSA. So many
meta-heuristic algorithms like Genetic Algorithm (GA) and
Differential Evolutionary Algorithm (DE) are developed in the
recent years to resolve the issue of optimization. Both GA and
DE are used to produce optimal sequence alignment. But
Compared to GA, DE is able to produce more optimal sequence
alignment. To further enhance the performance of DE a new
mutant is proposed by considering best, worst and a random
candidate solution and applied on DE. It is named as New
Differential Evolutionary Algorithm (NDE). By taking the test
sequences from a bench mark data set “prefab4ref” tests are
performed on GA, All DE mutants and NDE and it is observed
that the proposed algorithm NDE outperformed all the other
algorithms.
Keywords: Sequence Alignment, Biological Sequences,
Pairwise Sequence Alignment, Multiple Sequence Alignment,
Genetic Algorithm, Differential Evolutionary Algorithm.
I. INTRODUCTION
Biological Informatics, in brief Bioinformatics is a
combination of Biology, Computer Science and Information
Technology. Now-a-days it is called as Computational
Biology by many scientists. In order to solve so many
biological problems, Scientists are continuously striving to
design new algorithms [1]. Many bioinformatics tools and
databases were designed and developed by scientists to
analyse the biological data and to store the biological
information. Bioinformatics covers so many areas like
genetics, proteomics etc. One of the most prominent
applications of the Bioinformatics is Sequence Analysis and
Sequence Alignment. Sequence Alignment is mainly for the
determination of analogous regions with in the specified
Revised Manuscript Received on December 08, 2019
Lakshmi Naga Jayaprada. Gavarraju, Assoc.Prof, Dept. of Computer
Science & Engineering, Narasaraopeta Engineering College [Autonomous],
Narasaraopet, Guntur(Dt), A.P., India.
Kanadam Karteeka Pavan, Professor & Head Department of Computer
Applications, R.V.R.& J.C.College of Engineering [Autonomous],
Chowdavaram , Guntur , A.P., India.
biological sequences like nucleotide or protein sequences.
Identifying the analogous areas within the specified
biological sequences is for the purpose of finding functional
similarity or structural similarity or to evolve evolutionary
relationships among the specified sequences. Sequence
alignment is majorly of two varieties based on number of
sequences: Pair-wise and Multiple Sequence Alignments.
Aligning two biological sequences is called PSA and
Aligning multiple biological sequences is called MSA.
Sequence Alignment is divided into two categories
depending on the type of alignment: Global [2] and Local [3]
Sequence Alignments. Performing sequence alignment on
amino acid sequences is more appropriate than performing
the sequence alignment on nucleotide sequences. It is
because amino acid sequences (protein sequences) consist of
functional and structural information [4]. Many scoring
functions are utilized to find the similarity or identity among
the sequences. When comparing the nucleotide sequences a
simple scoring function called Identity Score (IS), can be
used, where similar nucleotide bases is assigned a positive
score, dissimilar a negative score and for a gap less negative
score is assigned. Another Scoring function called Column
Score (CS) can also be used, in which identical nucleotide
bases are present in a single column a value of ‘1’ is assigned
otherwise a value of ‘0’ is assigned. For protein sequences
another scoring function called Similarity Score (SS) can
also be used along with scoring functions like IS and CS. In
SS, amino acids with analogous physiochemical properties
are assigned a value based substitution matrices like Point
Accepted Mutation (PAM) [5] and BLOcked Substitution
Matrix (BLOSUM) [6]. A variety of PSA techniques are
available to produce best alignment to two given sequences
both local and global. To produce optimal alignment of the
two given sequences, previously Dynamic programming was
used. A Dynamic programming algorithm
“Smith-Waterman algorithm” [7] is used for local sequence
alignment and “Needleman-Wunsch algorithm” [8] is used
for global sequence alignment. Both the algorithms suffer
from a drawback specifically when two or many optimal
paths are generated and trace back is needed. Back tracking
becomes complex [9, 10]. So, many scientists tried to use
nature inspired optimization algorithms. Genetic Algorithm
(GA) is an optimization algorithm to solve the problem of
sequence alignment to produce the optimal alignment. A
multi objective GA was developed by Taneda for PSA of
RNA sequence alignment [11]. Notredame et.al., developed
an algorithms for the optimal alignment of RNA sequences
by GA called RAGA and
another algorithm Parallel GA
(PRAGA) [12]. Cedric
Pairwise Sequence Alignment by Differential
Evolutionary Algorithm with New Mutation
Strategy
Lakshmi Naga Jayaprada.Gavarraju, K. Karteeka Pavan