Enhancing Parallelism of Pairwise Statistical Significance Estimation for Local Sequence Alignment Yuhong Zhang 1,2 , Md. Mostofa Ali Patwary 2 , Sanchit Misra 2 , Ankit Agrawal 2 , Wei-keng Liao 2 , and Alok Choudhary 2 1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China Email: yuhongzhang@uestc.edu.cn 2 Department of Electrical Engineering and Computer Science, Northwestern University, USA Email: {mpatwary, smi539, ankitag, wkliao, choudhar}@eecs.northwestern.edu Abstract—Pairwise statistical significance (PSS) has been found to be able to accurately identify related sequences (homology detection), which is a fundamental step in nu- merous applications relating to sequence analysis. Although more accurate than database statistical significance, it is both computationally intensive and data intensive to construct the empirical score distribution during the estimation of PSS, which poses a big challenge in terms of performance and scalability. Multicore computers and clusters have become increasingly ubiquitous and more powerful than before. In this paper, we evaluate the use of OpenMP, MPI and hybrid paradigms to accelerate the estimation of PSS of local sequence alignment. Through distributing the compute-intensive kernels of the pairwise statistical significance estimation procedure across multiple computational units, we achieve a speedup of up to 113.10× using 128 cores. Keywords-Pairwise statistical significance; Multicore, OpenMP; MPI; Hybrid; I. I NTRODUCTION The recent decades have witnessed dramatic increase in the quantity and variety of publicly available proteomic and genomic sequence data. GenBank, for example, as of August 2011, has accumulated more than 10 12 nucleotides of nucleic acid sequence data, and continues to grow at an exponential rate, approximately doubling every 18 months [1]. How to deal with the massive quantities of data pouring from the sequencing factories, make sense of them, and render them accessible to people who are working on a wide variety of problems is a big challenge in bioinformatics [2, 3]. Pairwise sequence alignment (PSA) is widely used in the analysis of DNA and protein sequences [4, 5]. It builds the basic platform for many other biological applications such as homology detection, protein structure prediction, finding protein function and deciphering evolutionary relationships. Sequence alignment is an effective method that reports a score, indicating the relatedness between sequences. Gener- ally, a higher score indicates that the sequences are more related. However, the alignment score depends on various factors like the alignment program, scoring scheme, se- quence lengths, and compositions of sequence under com- parison [6]. It may not make sense to draw a conclusion about the relatedness of pairwise sequences from scores alone. Therefore, it is more appropriate to measure the quality of a PSA using the statistical significance of the score rather than the score itself [7]. An alignment score is more statistically significant if it has a low probability of occurring by chance. Statistical significance of sequence alignment scores is very important to know whether an observed sequence similarity could imply a functional or evolutionary link, or is a chance event [6]. Pairwise statistical significance (PSS) is a promising method to evaluate the statistical significance of an alignment, which is specific to the sequence-pair being aligned, and independent of database [8]. However, the estimation of PSS is very data- intensive and computation-intensive [9]. Therefore, applying high performance computing (HPC) techniques is an obvious choice to accelerate the estimation. Although FPGAs [10] and GPUs [11] have been used to accelerate the estimation of pairwise statistical significance, the researchers in bioinformatics community are required to acquire special knowledge to use them properly. More- over, special hardware requirement limits the utilization of those high performance platforms. On the other hand, the multicore computers or laptops and clusters have become increasingly ubiquitous and more powerful. Therefore, it is of interest to use high performance technologies to unlock the potential of computers or laptops and clusters. Based on this observation and motivation, in this paper, we present OpenMP, MPI and hybrid (OpenMP + MPI) implemen- tations to accelerate the estimation of PSS. After careful performance analysis, we have efficiently distributed the compute-intensive kernels of the algorithm across processors (cores), so as to reap the maximum benefits of OpenMP or/and MPI paradigms. Our experiments show that our paral- lelization methodology achieves high performance for these applications. The maximum speedup of OpenMP, MPI and hybrid implementations are 18.94×, 22.58× and 22.65×,