G. Allen et al. (Eds.): ICCS 2009, Part I, LNCS 5544, pp. 954–963, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Pairwise Distance Matrix Computation for Multiple
Sequence Alignment on the Cell Broadband Engine
Adrianto Wirawan, Bertil Schmidt, and Chee Keong Kwoh
School of Computer Engineering, Nanyang Technological University, Singapore 639798
{adri0004,asbschmidt,asckkwoh}@ntu.edu.sg
Abstract. Multiple sequence alignment is an important tool in bioinformatics.
Although efficient heuristic algorithms exist for this problem, the exponential
growth of biological data demands an even higher throughput. The recent
emergence of accelerator technologies has made it possible to achieve a highly
improved execution time for many bioinformatics applications compared to
general-purpose platforms. In this paper, we demonstrate how the PlayStation®3,
powered by the Cell Broadband Engine, can be used as a computational platform
to accelerate the distance matrix computation utilized in multiple sequence
alignment algorithms.
Keywords: multiple sequence alignment, cell broadband engine.
1 Introduction
Multiple sequence alignment (MSA) of many nucleotides or amino acids is an
important tool in bioinformatics. It can identify patterns or motifs to characterize
protein families, and is therefore utilized to detect homology between sequences as
well as to perform phylogenetic analysis. Many MSA heuristics have been proposed
to reduce the exponential complexity of computing optimal MSAs. Heuristic MSA
implementations include MSA[1], ClustalW[2], T-Coffee[3], MAFFT[4], DIALIGN
P[5] and PRALINE[6]. ClustalW[2] has over 26,000 citations in the ISI Web of
Science and is considered to be one of the most popular MSA tools. It is based on the
progressive alignment method. Although not optimal, this method can produce
reasonably good alignments at a good efficiency. However, the exponential growth of
biological data demands an even better throughput. Thus, software approaches to
improve the performance of ClustalW have been introduced, including caching[8, 9]
and parallel processing[10-12].
The recent emergence of accelerator technologies such as FPGAs, GPUs and
specialized processors have made it possible to achieve an improvement in execution
time for many bioinformatics applications compared to current general-purpose
platforms at a low cost. Recent usage of easily accessible accelerator technologies to
improve the ClustalW algorithm include FPGA[13] and GPU[14].
Our profiling of ClustalW has revealed that distance matrix computation is the most
time consuming stage and typically takes up more than 90% of the overall runtime.