Scheduling for Fast Response Multi-pattern Matching over Streaming Events Abstract— Real-time pattern matching over event streams has gained much more attention recently due to the analytical capability demanded in many operation-critical applications such as credit card fraud detection, algorithmic stock trading and RFID tracking. One of the common but important requirements in the above-mentioned applications is fast response. Usually, there are a large number of pattern queries subscribed in the system, running continuously and concurrently. However, not much research has been done on the scheduling algorithms and management to improve the overall response time of these queries. To address this challenge, we focus on the study of how to improve the average response time of multiple pattern queries. We ﬁrst propose two static scheduling algorithms: Event- based (EBS) and Run-based (RBS) Scheduling and discuss what would be a better choice under diﬀerent system conﬁgurations. We then come up with a hybrid method called Fast Response Time Scheduling (FRTS) to dynamically manage the scheduling in order to further reduce the average response time. The experimental results of these scheduling algorithms have shown that the FRTS method can improve 5 times average response time comparing with the basic methods in some cases. I. Introduction The complex event processing (CEP) has re-gained a lot of attentions in recent years, due to its expected capabilities to help business detect, analyze and respond to complex, time- varying events. It provides the enabling features for today’s business to run its operation in a more agile way. Various technologies and systems have been proposed in recent years providing the ability to process pattern queries over event stream, such as the academic research systems Aurora [1], [2], Borealis [3], STREAM [4], Telegraph [5], SASE [6], Cayuga [7], PIPES [8] and the products in the industry such as Coral8 & Aleri [9], StreamBase [10], Oracle CEP [11], etc. In these modern applications, response time is extremely important not only for providing high quality services but also because it is the deterministic factor to win the success in the business. In stock trading, for instance, the investor places the trading order and submits it to the electronic stockbroker system according to the future trend he/she can predict. This prediction is based on the current pattern matchings that he/she is interested (e.g. double top, divergence of moving average and price lines or pair trades as introduced in [7]). When an event arrives, if it happens to be the last event needed to satisfy the conditions of a pre-deﬁned pattern query, there should be a pattern matching result reported to the investor. Suppose the investor can realize the occurrence of the interested pattern in a timely manner, he/she can take the proper actions quickly to either gain a substantial proﬁt or avoid any major lost. Therefore, we would like to reduce the time interval between the arrival of the last event qualifying the pattern query and the point we complete the query process, sending back the result to the user. We call this time interval the response time which we are going to manage. Usually, there are a large number of pattern queries subscribed into the system and each pattern will generate multiple runtime instances. Therefore, there are thousands or even millions of such instances running continuously and concurrently. To reduce the response time of multiple pattern queries, there are two kinds of relevant techniques: query evaluation model of pattern matching over event stream and multi-pattern query optimization. Diﬀerent from conventional selection, aggregation, join or time series query processing over streaming events, pattern matching involves complex predicates over single event, be- tween pairs of subsequence events (e.g. event e i+1 . price > e i . price ) or among a group of events (e.g. event e j . value ≤ ∑ j i=1 e i . value ). The pattern that contains a sequence of events of ﬁnite length which all satisfy a certain condition, is called a Kleene closure pattern. The complexity of Kleene closure pattern matching is high, which is proved in [12] due to its nature of un-predicated number of matching events. For the pattern matching evaluation model, NFA based structure is widely used [13], [12], [14], [15]. At runtime, each pattern would generate multiple runtime patterns. As the number of splitting instances of patterns during runtime is unknown, query optimization becomes another challenge. The authors of [12], [14] provide algorithms to improve the throughput of multiple runtime patterns. However, even in a high throughput plan, there is still space to improve the response time with proper scheduling approaches. Though the recent eﬀort in [16] considers response time, it can only support the processing of matching non-Kleene closure patterns. In this paper, we propose the scheduling approaches to satisfy the applications’ needs for fast response time pattern matching to large numbers of simultaneous pattern queries with Kleene closure. Our contributions are summarized as follows: Ying Yan, Jin Zhang, Ming-Chien Shan Technology Lab, China The Office of Chief Scientist, SAP {ying.yan, gene.zhang, ming-chien.shan}@sap.com 978-1-4244-5446-4/10/$26.00 © 2010 IEEE ICDE Conference 2010 89