589 | Page A RANDOMIZED ALGORITHM FOR FAST SEQUENCE ALIGNMENT Akash Nag 1 , Dr. Sunil Karforma 2 1 Research Scholar, 2 Associate Professor, Department of Computer Science, The University of Burdwan (India) ABSTRACT Sequence alignment is one of the most important tools available to molecular biologists in order to find similarities between two or more DNA or protein sequences. However, despite several advances in this field, multiple sequence alignment (MSA) is often a slow process, and still, optimum results cannot be guaranteed. MSA is an important prerequisite for constructing phylogenetic trees, which are often the underlying goal for biologists to trace the evolution of closely related species. In this paper, we propose a randomized algorithm that can produce a sufficiently good MSA, for huge datasets, in a fraction of the time taken by contemporary MSA algorithms. Keywords: Bioinformatics, Multiple Sequence Alignment, Randomized Algorithms, Sequence Analysis I. INTRODUCTION Ever since sequence alignment gained significance, a large number of algorithms have been published. Most of these can be divided into two categories: pairwise sequence alignment algorithms and multiple sequence alignment algorithms. Pairwise sequence alignment is probably more of a theoretical interest only, as most problems deal with multiple sequences. However, we need to understand how pairwise sequence alignment works in order to proceed to multiple sequence alignment. Compared to MSA, pairwise sequence alignment algorithms can produce optimum results because of the small size of their input sequences. Several dynamic programming algorithms are available for this: the most famous being that of Needleman and Wunsch [1] and that of Smith and Waterman [2]. The former deals with global alignments while the latter with local alignments. Both of these produce the best possible alignment for the two given sequences. Several improvements in time complexity of these algorithms were made in later times, most notably by Gotoh [3] and then by Altschul and Erickson [4]. As DNA sequencing techniques improved, sequence databases started growing exponentially. A new problem was introduced: database search. Given a query sequence, an algorithm was required that could give out the most closely matching sequence in the database. A naïve solution would be to perform a pairwise sequence alignment of the input sequence to every sequence in the database, and then reporting the entry for which the alignment score was the highest. The alignment score is the sum of matches of nucleotides (or amino-acids in case of proteins) at each position of the aligned pair, reduced by the penalty score for mismatches and gaps. However, the time complexity of the dynamic programming algorithms, as well as the huge size of the databases