Massively Parallelized DNA Motif Search on the Reconfigurable Hardware Platform COPACOBANA Jan Schr¨ oder, Lars Wienbrandt, Gerd Pfeiffer, and Manfred Schimmler Department of Computer Science, Christian-Albrechts-University of Kiel, Germany {jasc,lwi,gp,masch}@informatik.uni-kiel.de Abstract. An enhanced version of an existing motif search algorithm BMA is presented. Motif searching is a computationally expensive task which is frequently performed in DNA sequence analysis. The algorithm has been tailored to fit on the COPACOBANA architecture, which is a massively parallel machine consisting of 120 FPGA chips. The perfor- mance gained exceeds that of a standard PC by a factor of over 1, 650 and speeds up the time intensive search for motifs in DNA sequences. In terms of energy consumption COPACOBANA needs 1/400 of the energy of a PC implementation. Key words: motif finding, DNA sequence analysis, FPGA, High Per- formance Reconfigurable Computing (HPRC) 1 Introduction The discovery of regulatory sequences in DNA - called motif-finding - is one of the most challenging problems in the field of bioinformatics. In fact there are problem instances of motif-finding which are unsolvable by current techniques. There are two reasons that make this problem so difficult: firstly, the parameters of a given problem instance (like sequence length, motif length, grade of mutation) can make it impossible to identify motifs due to background noise. Secondly, it is computationally expensive. So a precise algorithm can fail to discover a motif in a given sequence because its execution time exceeds rational means. We address both problems with a new approach to motif searching making use of a novel massively parallel architecture to speed up the execution time. Motif searching has been an issue in many publications of the last ten years. As the most popular approaches to this topic we reference MEME [16] [17] [18] and the similar Gibbs sampler [14] [15] which iteratively develops matrices rep- resenting motifs of the input sequence using the expectation maximization tech- nique; the projection algorithm [13] [20] which creates a representation of the highly conserved region over all motif instances; and CONSENSUS [21] - a greedy approach which constructs likely motif candidates by aligning only small parts of the genome at a time.