Parallel Implementation & Performance Evaluation of Blast Algorithm on Linux Cluster Nisha Dhankher , O P Gupta School of Electrical Engineering & IT COAE&T, PAU Ludhiana, India Abstract-The aim of this paper is to investigate the performance of parallel implementation of BLAST algorithm on HPC platform using Infiniband. This paper described the optimized and extended version of mpiBLAST called mpiBLAST-PIO. Due to high non- search overhead, parallel-writing the results by the slaves evolved as the efficient solution to the problem. Keywords: Bioinformatics, mpiBLAST, HPC, Infiniband, Cluster Computing I. INTRODUCTION Genomic sequence-search is a basic problem of computational biology that has greatly benefited from parallel and distributed computing. The most widely used sequence-search tool is BLAST. BLAST is a fast program that efficiently calculates local pairwise alignment based on approximation. Through sequence alignment ( or sequence comparison) of two biological sequences, researchers can find evolutionary information about a new sequence. Similarities between newly discovered sequence and a known sequence can help in determining functions of the new sequence and find sibling species from common ancestor. There are two types of sequence alignment problems: global and local. The global alignment algorithm finds the best match between the entire sequences whereas the local alignment algorithm finds the best match between parts of the sequences. The first algorithms devised for sequence-alignment were Needleman Wunsch (1979) and Smith Waterman (1981). These were based on dynamic programming and produce optimal solutions but had time complexity O(n 2 ). As a result, heuristic based BLAST algorithm was proposed by Altschul et al in 1990. BLAST searches a query sequence containing DNA or proteins against a database of known nucleotide or peptides sequences in linear time using a statistical model. BLAST heuristic search, first, breaks the query into words of length w (by default w=3) and compare them to each database sequence. The matching words (or seeds) are then extended in both the direction until the score of alignment drops below a threshold to form the High Scoring Segment Pair (HSP). BLAST2 uses 2 -hit alignment algorithm to find the top- scoring HSP's which are combined to form consistent local alignment. BLAST's final result consists of a series of local alignments, ordered by the similarity score along with an e- value. BLAST program has the capability to compare all possible combinations of query and database sequence types by translating them. BLAST search types are: 1. blastn: search nucleotide database using a nucleotide query. 2. blastp: searches protein database using a protein query. 3. blastx: search protein database using a translated nucleotide query. 4. tblastn: search translated nucleotide database using a protein query. 5. tblastx: search translated nucleotide database using a translated nucleotide query. Recent advances in molecular biology techniques, has led to the exponential growth of sequence databases. Although CPU architectures are struggling to show better performance, traditional techniques to sequence homology searches using BLAST have proven to be slow to keep up with the current rate of sequence acquisition (Kent 2002). As BLAST is both computationally intensive and parallelizes well, many parallel and distributed approaches of parallelizing BLAST have been proposed by programmers. The mpiBLAST Algorithm mpiBLAST is a freely available open-source parallelization of National Centre for Biotechnology Information (NCBI) BLAST, which achieves super linear speedup by segmenting a BLAST database. It is designed to work on a computer cluster using MPI library and adopts a master-slave style. (Darling et al 2003) The mpiBLAST algorithm consists of three steps: 1. Segmenting and distributing the database, 2. Running mpiBLAST queries on each node, 3. Merging the results from each node into a single output. Before mpiBLAST search, the database is formatted and segmented using a wrapper called mpiformatdb and placed at shared storage. mpiBLAST enables the master node to assign the query sequence and database fragment to each worker node. The worker nodes perform the BLAST search on queries and send the results to the master node. When one worker node complets its task, the master node assign a new fragment to it. This procedure is repeated until all the queries have been searched. The master node merge all the results and sorts them according to score. Results written in output file can be in any format including XML, HTML, simple text, ASN.1. However, mpiBLAST suffers from non-search overheads with increasing number of processors and varying database sizes. So, Lin et al 2005 proposed pio- Nisha Dhankher et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3) , 2014, 4818-4820 www.ijcsit.com 4818