Scheduling for Fast Response Multi-pattern
Matching over Streaming Events
Abstract— Real-time pattern matching over event streams has
gained much more attention recently due to the analytical
capability demanded in many operation-critical applications such
as credit card fraud detection, algorithmic stock trading and
RFID tracking. One of the common but important requirements
in the above-mentioned applications is fast response. Usually,
there are a large number of pattern queries subscribed in the
system, running continuously and concurrently. However, not
much research has been done on the scheduling algorithms
and management to improve the overall response time of these
queries. To address this challenge, we focus on the study of
how to improve the average response time of multiple pattern
queries. We first propose two static scheduling algorithms: Event-
based (EBS) and Run-based (RBS) Scheduling and discuss what
would be a better choice under different system configurations.
We then come up with a hybrid method called Fast Response
Time Scheduling (FRTS) to dynamically manage the scheduling
in order to further reduce the average response time. The
experimental results of these scheduling algorithms have shown
that the FRTS method can improve 5 times average response
time comparing with the basic methods in some cases.
I. Introduction
The complex event processing (CEP) has re-gained a lot of
attentions in recent years, due to its expected capabilities to
help business detect, analyze and respond to complex, time-
varying events. It provides the enabling features for today’s
business to run its operation in a more agile way. Various
technologies and systems have been proposed in recent years
providing the ability to process pattern queries over event
stream, such as the academic research systems Aurora [1], [2],
Borealis [3], STREAM [4], Telegraph [5], SASE [6], Cayuga
[7], PIPES [8] and the products in the industry such as Coral8
& Aleri [9], StreamBase [10], Oracle CEP [11], etc.
In these modern applications, response time is extremely
important not only for providing high quality services but also
because it is the deterministic factor to win the success in the
business. In stock trading, for instance, the investor places
the trading order and submits it to the electronic stockbroker
system according to the future trend he/she can predict. This
prediction is based on the current pattern matchings that he/she
is interested (e.g. double top, divergence of moving average
and price lines or pair trades as introduced in [7]). When an
event arrives, if it happens to be the last event needed to satisfy
the conditions of a pre-defined pattern query, there should be
a pattern matching result reported to the investor. Suppose the
investor can realize the occurrence of the interested pattern in
a timely manner, he/she can take the proper actions quickly
to either gain a substantial profit or avoid any major lost.
Therefore, we would like to reduce the time interval between
the arrival of the last event qualifying the pattern query and
the point we complete the query process, sending back the
result to the user. We call this time interval the response time
which we are going to manage. Usually, there are a large
number of pattern queries subscribed into the system and each
pattern will generate multiple runtime instances. Therefore,
there are thousands or even millions of such instances running
continuously and concurrently. To reduce the response time
of multiple pattern queries, there are two kinds of relevant
techniques: query evaluation model of pattern matching over
event stream and multi-pattern query optimization.
Different from conventional selection, aggregation, join or
time series query processing over streaming events, pattern
matching involves complex predicates over single event, be-
tween pairs of subsequence events (e.g. event e
i+1
.
price
>
e
i
.
price
) or among a group of events (e.g. event e
j
.
value
≤
∑
j
i=1
e
i
.
value
). The pattern that contains a sequence of events
of finite length which all satisfy a certain condition, is called
a Kleene closure pattern. The complexity of Kleene closure
pattern matching is high, which is proved in [12] due to its
nature of un-predicated number of matching events. For the
pattern matching evaluation model, NFA based structure is
widely used [13], [12], [14], [15]. At runtime, each pattern
would generate multiple runtime patterns. As the number of
splitting instances of patterns during runtime is unknown,
query optimization becomes another challenge. The authors
of [12], [14] provide algorithms to improve the throughput of
multiple runtime patterns. However, even in a high throughput
plan, there is still space to improve the response time with
proper scheduling approaches. Though the recent effort in [16]
considers response time, it can only support the processing of
matching non-Kleene closure patterns.
In this paper, we propose the scheduling approaches to
satisfy the applications’ needs for fast response time pattern
matching to large numbers of simultaneous pattern queries
with Kleene closure. Our contributions are summarized as
follows:
Ying Yan, Jin Zhang, Ming-Chien Shan
Technology Lab, China
The Office of Chief Scientist, SAP
{ying.yan, gene.zhang, ming-chien.shan}@sap.com
978-1-4244-5446-4/10/$26.00 © 2010 IEEE ICDE Conference 2010 89