Text Searching on a Heterogeneous Cluster of Workstations Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed Processing Laboratory Department of Applied Informatics, University of Macedonia 156 Egnatia str., P.O. Box 1591, 54006, Thessaloniki, Greece {panosm,kmarg}@uom.gr http://macedonia.uom.gr/∼{panosm,kmarg} Abstract. In this paper we propose a high-performance flexible text searching implementation on a heterogeneous cluster of workstations us- ing MPI message passing library. We test this parallel implementation and present experimental results for different text sizes and number of workstations. 1 Introduction Text searching is a very important component of many problems, including text processing, information retrieval, pattern recognition and DNA sequencing. Es- pecially with the introduction of search engines dealing with tremendous amount of textual information presented on the World Wide Web (WWW) as well as the research on DNA sequencing, this problem deserves special attention and any improvements to speed up the process will benefit these important applications. The basic text searching problem can be defined as follows. Let a given al- phabet (a finite sequence characters) Σ, a short pattern string P=P[1]P[2]...P[m] of length m and a large text string T=T[1]T[2]...T[n] of length n, where both the pattern and the text are sequences of characters from Σ, with mn. The text searching problem consists of finding one or more generally all the exact occurrences of a pattern P in a text T. Survey and experimental results of well known algorithms for this text searching problem can be found in [3], [9], [11], [16]. The implementation of the text searching problem on a cluster of worksta- tions or PCs [1] can provide the computing power required for the speed up the searching on large free text collections. In [8], [12] five sequential text search- ing algorithms were parallelised and tested on a homogeneous cluster giving very positive experimental results. In [13] a performance prediction model was proposed for static master-worker model on a homogeneous cluster. In [10] a parallel text searching implementation was presented for static master-worker model and results are reported for the Brute-Force text searching algorithm [11] on a heterogeneous cluster. The contribution of this work is the implementation of a parallel flexible text searching algorithm using cluster computing technique. This algorithm realized Y. Cotronis and J. Dongarra (Eds.): Euro PVM/MPI 2001, LNCS 2131, pp. 378–385, 2001. c Springer-Verlag Berlin Heidelberg 2001