Enhancing Parallelism of Pairwise Statistical Signiﬁcance Estimation for Local Sequence Alignment Yuhong Zhang 1,2 , Md. Mostofa Ali Patwary 2 , Sanchit Misra 2 , Ankit Agrawal 2 , Wei-keng Liao 2 , and Alok Choudhary 2 1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China Email: yuhongzhang@uestc.edu.cn 2 Department of Electrical Engineering and Computer Science, Northwestern University, USA Email: {mpatwary, smi539, ankitag, wkliao, choudhar}@eecs.northwestern.edu Abstract—Pairwise statistical signiﬁcance (PSS) has been found to be able to accurately identify related sequences (homology detection), which is a fundamental step in nu- merous applications relating to sequence analysis. Although more accurate than database statistical signiﬁcance, it is both computationally intensive and data intensive to construct the empirical score distribution during the estimation of PSS, which poses a big challenge in terms of performance and scalability. Multicore computers and clusters have become increasingly ubiquitous and more powerful than before. In this paper, we evaluate the use of OpenMP, MPI and hybrid paradigms to accelerate the estimation of PSS of local sequence alignment. Through distributing the compute-intensive kernels of the pairwise statistical signiﬁcance estimation procedure across multiple computational units, we achieve a speedup of up to 113.10× using 128 cores. Keywords-Pairwise statistical signiﬁcance; Multicore, OpenMP; MPI; Hybrid; I. I NTRODUCTION The recent decades have witnessed dramatic increase in the quantity and variety of publicly available proteomic and genomic sequence data. GenBank, for example, as of August 2011, has accumulated more than 10 12 nucleotides of nucleic acid sequence data, and continues to grow at an exponential rate, approximately doubling every 18 months [1]. How to deal with the massive quantities of data pouring from the sequencing factories, make sense of them, and render them accessible to people who are working on a wide variety of problems is a big challenge in bioinformatics [2, 3]. Pairwise sequence alignment (PSA) is widely used in the analysis of DNA and protein sequences [4, 5]. It builds the basic platform for many other biological applications such as homology detection, protein structure prediction, ﬁnding protein function and deciphering evolutionary relationships. Sequence alignment is an effective method that reports a score, indicating the relatedness between sequences. Gener- ally, a higher score indicates that the sequences are more related. However, the alignment score depends on various factors like the alignment program, scoring scheme, se- quence lengths, and compositions of sequence under com- parison [6]. It may not make sense to draw a conclusion about the relatedness of pairwise sequences from scores alone. Therefore, it is more appropriate to measure the quality of a PSA using the statistical signiﬁcance of the score rather than the score itself [7]. An alignment score is more statistically signiﬁcant if it has a low probability of occurring by chance. Statistical signiﬁcance of sequence alignment scores is very important to know whether an observed sequence similarity could imply a functional or evolutionary link, or is a chance event [6]. Pairwise statistical signiﬁcance (PSS) is a promising method to evaluate the statistical signiﬁcance of an alignment, which is speciﬁc to the sequence-pair being aligned, and independent of database [8]. However, the estimation of PSS is very data- intensive and computation-intensive [9]. Therefore, applying high performance computing (HPC) techniques is an obvious choice to accelerate the estimation. Although FPGAs [10] and GPUs [11] have been used to accelerate the estimation of pairwise statistical signiﬁcance, the researchers in bioinformatics community are required to acquire special knowledge to use them properly. More- over, special hardware requirement limits the utilization of those high performance platforms. On the other hand, the multicore computers or laptops and clusters have become increasingly ubiquitous and more powerful. Therefore, it is of interest to use high performance technologies to unlock the potential of computers or laptops and clusters. Based on this observation and motivation, in this paper, we present OpenMP, MPI and hybrid (OpenMP + MPI) implemen- tations to accelerate the estimation of PSS. After careful performance analysis, we have efﬁciently distributed the compute-intensive kernels of the algorithm across processors (cores), so as to reap the maximum beneﬁts of OpenMP or/and MPI paradigms. Our experiments show that our paral- lelization methodology achieves high performance for these applications. The maximum speedup of OpenMP, MPI and hybrid implementations are 18.94×, 22.58× and 22.65×,