Journal of Computer Science 4 (5): 393-401, 2008
ISSN 1549-3636
© 2008 Science Publications
Corresponding Author: Amjad Hudaib, Department of Computer Information Systems, University of Jordan, Amman 11942,
Jordan Tel.: +962-5355000/ext: 22610 Fax: +962-5354070
393
A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)
Amjad Hudaib, Rola Al-Khalid, Dima Suleiman, Mariam Itriq and Aseel Al-Anani
Department of Computer Information Systems, University of Jordan, Amman 11942 Jordan
Abstract: In this research, we propose a fast pattern matching algorithm: The Two Sliding Windows
(TSW) algorithm. The algorithm makes use of two sliding windows, each window has a size that is
equal to the pattern length. Both windows slide in parallel over the text until the first occurrence of the
pattern is found or until both windows reach the middle of the text. The experimental results show that
TSW algorithm is superior to other algorithms especially when the pattern occurs at the end of the text.
Key words: Pattern matching, string matching, berry-ravindran algorithm, boyer moore
INTRODUCTION
Pattern matching is a pivotal theme in computer
research because of its relevance to various applications
such as web search engines, computational biology,
virus scan software, network security and text
processing
[1-4]
.
Pattern matching focuses on finding the
occurrences of a particular pattern P of length ‘m’ in a
text ‘T’ of length ‘n’. Both the pattern and the text are
built over a finite alphabet set called ∑ of size σ.
Generally, pattern matching algorithms make use
of a single window whose size is equal to the pattern
length
[5]
. The searching process starts by aligning the
pattern to the left end of the text and then the
corresponding characters from the pattern and the text
are compared. Character comparisons continue until a
whole match is found or a mismatch occurs, in either
case the window is shifted to the right in a certain
distance
[6-12]
. The shift value, the direction of the sliding
window and the order in which comparisons are made
varies in different pattern matching algorithms.
Some pattern matching algorithms concentrate on
the pattern itself
[5]
. Other algorithms compare the
corresponding characters of the pattern and the text
from left to right
[6]
. Others perform character
comparisons from right to left
[8,11]
. The performance of
the algorithms can be enhanced when comparisons are
done in a specific order
[9,13]
. In some algorithms the
order of comparisons is irrelevant such as Brute Force
and Horspool algorithms
[7]
.
In this study, we propose a new pattern matching
algorithm: The Two Sliding Windows algorithm
(TSW). The algorithm concentrates on both the pattern
and the text. It makes use of two windows of size that is
equal to the size of the pattern. The first window is
aligned with the left end of the text while, the second
window is aligned with the right end of the text. Both
windows slide at the same time (in parallel) over the
text in the searching phase to locate the pattern. The
windows slide towards each other until the first
occurrence of the pattern from either side in the text is
found or they reach the middle of the text. If required,
all the occurrences of the pattern in the text can be
found.
Related works: Several pattern matching algorithms
have been developed with a view to enhance the
searching processes by minimizing the number of
comparisons performed
[14-16]
. To reduce the number of
comparisons, the matching process is usually divided
into two phases. The pre-processing phase and the
searching phase. The pre-processing phase determines
the distance (shift value) that the pattern window will
move. The searching phase uses this shift value while
searching for the pattern in the text with as minimum
character comparisons as possible.
In Brute Force algorithm (BF), no pre-processing
phase is performed. It compares the pattern with the
text from left to right. After each attempt, it shifts the
pattern by exactly one position to the right. The time
complexity of the searching phase is O (mn) in the
worst case and the expected number of text character
comparisons is (2n).
New ways to reduce the number of comparisons
performed by moving the pattern more than one
position are proposed by many algorithms such as
Boyer-Moore (BM)
[11,17]
and Knuth-Morris-Pratt
algorithms (KMP)
[6,18]
.