Space and Time Efficient Algorithms for Planted Motif Search Jaime Davila, Sudha Balla, and Sanguthevar Rajasekaran CSE Department at University of Connecticut, Storrs {jdavila, ballasudha, rajasek}@engr.uconn.edu Abstract. We consider the (l, d) Planted Motif Search Problem, a prob- lem that arises from the need to find transcription factor-binding sites in genomic information. We propose the algorithms PMSi and PMSP which are based on ideas considered in PMS1 [10]. These algorithms are exact, make use of less space than the known exact algorithms such as PMS and are able to tackle instances with large values of d. In particular algorithm PMSP is able to solve the challenge instance (17, 6), which has not reported solved before in the literature. 1 Introduction The Planted Motif Search Problem arises from the need to find transcription factor-binding sites in genomic information and has been studied extensively in the biocomputing literature –see [11] for a literature survey. The problem can be defined in the following formal way. Definition 1. Given a string s with |s| = m and a string x with |x| = l with l<m. We say x⊳ l s if x is a subsequence of s. Equivalently we say that x is an l-mer of s. Definition 2. Given a set of strings {s i } n i=1 over an alphabet Σ , with |s i | = m and l, d with 0 d<l<m we define the (l, d) motif search problem as that of finding a string x with |x| = l such that s i has an l-mer x i with d H (x, x i )= d for i =1,...,n. We will call x a motif. This problem is known to be NP-complete [6] and a PTAS exists for variants of the problem known as the Common Approximate Substring and Common Ap- proximate String [7], [1]. However the high degree in the polynomial complexity of the PTAS makes it of little practical use. Numerous algorithms have been implemented in order to solve instances of this problem. Among them we have Random Projection [2], MITRA [4], Win- nower [8], Pattern Branching [9], Hybrid Sample and Pattern Driven Approaches [12], PMS1 [10], CENSUS [5] and Voting [3]. This research was supported in part by the NSF Grant ITR-0326155. V.N. Alexandrov et al. (Eds.): ICCS 2006, Part II, LNCS 3992, pp. 822–829, 2006. c Springer-Verlag Berlin Heidelberg 2006