Volume 8, No. 5, May – June 2017
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
© 2015-19, IJARCS All Rights Reserved 2307
ISSN No. 0976-5697
Comparison of Different Sequence Alignment Methods- A Survey
Yadvir Kaur
Student, Department of Computer Engineering,
Punjabi University, Patiala,
Punjab, India.
Neelofar Sohi
Assistant Professor, Department of Computer Engineering,
Punjabi University, Patiala,
Punjab, India.
Abstract: Bioinformatics is a promising and inventive research field. Biological Sequence alignment is the inborn part of bioinformatics, which
helps to find similarity between biological sequences i.e. DNA and protein. Alignment of biological sequences helps to discover functional and
structural similarity of sequences. The biological sequence database has been expanding rapidly due to new sequences being found, which has
raised the demand to employ more efficient and fast algorithm. There has been an eruption algorithm in the past few decades to find optimal or
nearly-optimal alignments. This paper is focused on the popular sequence alignment algorithms. Different types of alignment method have been
discussed on the basis of their optimality and approximate solutions. It has been studied that optimal algorithms, which are based on dynamic
programming are giving exact solutions. But these are highly computationally complexed. The stochastic optimization methods has been chosen
from literature as the potential candidates for the solution of complex multiple sequence alignments with better speed and care.
Keywords: bioinformatics; sequence alignment; DNA; RNA; optimal alignment methods.
I. INTRODUCTION
Bioinformatics is an interdisciplinary research area at the
interface between biology, computer science, medicine and
statistics as shown in Fig. 1. It is a union of biology and
informatics, as it involves the computers techniques for
storage, retrieval and manipulation of information related to
biomolecules for example, Deoxyribonucleic acid (DNA),
Ribonucleic acid (RNA) and proteins.
Figure 1. Bioinformatics – An interdisciplinary field.
Sequence alignment is an active research area under
bioinformatics. Sequence alignment compares two or more
sequences to align their residues (e.g., nucleotide bases of
DNA and RNA, or amino acids of a protein). The optimal
alignment procedure arranges sequences in such a way so as
to maximize the number of identical residue matches. The
unaligned and aligned sequences are shown in Fig. 2.
There are various purposes behind the sequence
alignment task. Sequence alignment is an important step as it
helps to discover structural, operational and evolutionary
relationship between the aligned sequences. Biologists work
with these aligned sequences to build phylogenetic trees,
characterize protein families, and foresee protein structure.
The analysis of sequences has helped biologists to detect
pathogens, develop drugs, and identify common genes.
The vast amount of biological data that is stored in the
form of DNA, RNA and protein sequences requires extensive
processing power to retrieve and analyse sequences quickly
and precisely. With new biological sequences being found
almost on an everyday premise, the biological sequence
database is developing exponentially. This explosion of data
demands new algorithms which are quick but then proficient.
There has been a blast of new algorithms, of which famous
algorithms are examined in this paper.
A. Biological Sequences
Biological sequence is either a DNA, ribonucleic acid
(RNA), or amino acid (protein) sequence.
DNA/RNA are constituted of nucleotide bases. The
nucleotide bases are: adenine (A), thymine (T), cytosine (C),
guanine (G), and uracil (U). On the other hand, protein are
constituted of amino acids. DNA, RNA or protein sequence
or string consists of their respective alphabets as shown
below:
DNA (4 bases) : {A,C, G,T}
RNA (4 bases) : {A,C,G,U}
Proteins (20 amino acids):
{A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}.
Figure 2. Unaligned and aligned sequences.