Volume 8, No. 5, May – June 2017 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info © 2015-19, IJARCS All Rights Reserved 2307 ISSN No. 0976-5697 Comparison of Different Sequence Alignment Methods- A Survey Yadvir Kaur Student, Department of Computer Engineering, Punjabi University, Patiala, Punjab, India. Neelofar Sohi Assistant Professor, Department of Computer Engineering, Punjabi University, Patiala, Punjab, India. Abstract: Bioinformatics is a promising and inventive research field. Biological Sequence alignment is the inborn part of bioinformatics, which helps to find similarity between biological sequences i.e. DNA and protein. Alignment of biological sequences helps to discover functional and structural similarity of sequences. The biological sequence database has been expanding rapidly due to new sequences being found, which has raised the demand to employ more efficient and fast algorithm. There has been an eruption algorithm in the past few decades to find optimal or nearly-optimal alignments. This paper is focused on the popular sequence alignment algorithms. Different types of alignment method have been discussed on the basis of their optimality and approximate solutions. It has been studied that optimal algorithms, which are based on dynamic programming are giving exact solutions. But these are highly computationally complexed. The stochastic optimization methods has been chosen from literature as the potential candidates for the solution of complex multiple sequence alignments with better speed and care. Keywords: bioinformatics; sequence alignment; DNA; RNA; optimal alignment methods. I. INTRODUCTION Bioinformatics is an interdisciplinary research area at the interface between biology, computer science, medicine and statistics as shown in Fig. 1. It is a union of biology and informatics, as it involves the computers techniques for storage, retrieval and manipulation of information related to biomolecules for example, Deoxyribonucleic acid (DNA), Ribonucleic acid (RNA) and proteins. Figure 1. Bioinformatics – An interdisciplinary field. Sequence alignment is an active research area under bioinformatics. Sequence alignment compares two or more sequences to align their residues (e.g., nucleotide bases of DNA and RNA, or amino acids of a protein). The optimal alignment procedure arranges sequences in such a way so as to maximize the number of identical residue matches. The unaligned and aligned sequences are shown in Fig. 2. There are various purposes behind the sequence alignment task. Sequence alignment is an important step as it helps to discover structural, operational and evolutionary relationship between the aligned sequences. Biologists work with these aligned sequences to build phylogenetic trees, characterize protein families, and foresee protein structure. The analysis of sequences has helped biologists to detect pathogens, develop drugs, and identify common genes. The vast amount of biological data that is stored in the form of DNA, RNA and protein sequences requires extensive processing power to retrieve and analyse sequences quickly and precisely. With new biological sequences being found almost on an everyday premise, the biological sequence database is developing exponentially. This explosion of data demands new algorithms which are quick but then proficient. There has been a blast of new algorithms, of which famous algorithms are examined in this paper. A. Biological Sequences Biological sequence is either a DNA, ribonucleic acid (RNA), or amino acid (protein) sequence. DNA/RNA are constituted of nucleotide bases. The nucleotide bases are: adenine (A), thymine (T), cytosine (C), guanine (G), and uracil (U). On the other hand, protein are constituted of amino acids. DNA, RNA or protein sequence or string consists of their respective alphabets as shown below: DNA (4 bases) : {A,C, G,T} RNA (4 bases) : {A,C,G,U} Proteins (20 amino acids): {A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}. Figure 2. Unaligned and aligned sequences.