Text Searching on a Heterogeneous Cluster of Workstations Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed Processing Laboratory Department of Applied Informatics, University of Macedonia 156 Egnatia str., P.O. Box 1591, 54006, Thessaloniki, Greece {panosm,kmarg}@uom.gr http://macedonia.uom.gr/∼{panosm,kmarg} Abstract. In this paper we propose a high-performance ﬂexible text searching implementation on a heterogeneous cluster of workstations us- ing MPI message passing library. We test this parallel implementation and present experimental results for diﬀerent text sizes and number of workstations. 1 Introduction Text searching is a very important component of many problems, including text processing, information retrieval, pattern recognition and DNA sequencing. Es- pecially with the introduction of search engines dealing with tremendous amount of textual information presented on the World Wide Web (WWW) as well as the research on DNA sequencing, this problem deserves special attention and any improvements to speed up the process will beneﬁt these important applications. The basic text searching problem can be deﬁned as follows. Let a given al- phabet (a ﬁnite sequence characters) Σ, a short pattern string P=P[1]P[2]...P[m] of length m and a large text string T=T[1]T[2]...T[n] of length n, where both the pattern and the text are sequences of characters from Σ, with m≤n. The text searching problem consists of ﬁnding one or more generally all the exact occurrences of a pattern P in a text T. Survey and experimental results of well known algorithms for this text searching problem can be found in [3], [9], [11], [16]. The implementation of the text searching problem on a cluster of worksta- tions or PCs [1] can provide the computing power required for the speed up the searching on large free text collections. In [8], [12] ﬁve sequential text search- ing algorithms were parallelised and tested on a homogeneous cluster giving very positive experimental results. In [13] a performance prediction model was proposed for static master-worker model on a homogeneous cluster. In [10] a parallel text searching implementation was presented for static master-worker model and results are reported for the Brute-Force text searching algorithm [11] on a heterogeneous cluster. The contribution of this work is the implementation of a parallel ﬂexible text searching algorithm using cluster computing technique. This algorithm realized Y. Cotronis and J. Dongarra (Eds.): Euro PVM/MPI 2001, LNCS 2131, pp. 378–385, 2001. c  Springer-Verlag Berlin Heidelberg 2001