Scheduling Multiple Continuous Queries to Improve QoD ∗ Mohamed A. Sharaf, Alexandros Labrinidis, Panos K. Chrysanthis, Kirk Pruhs Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260, USA {msharaf, labrinid, panos, kirk}@cs.pitt.edu ABSTRACT Quality of Service (QoS) and Quality of Data (QoD) are the two major dimensions for evaluating any query processing system. In the context of the new data stream management stystems (DSMSs), multi-query scheduling has been exploited to improve QoS. In this paper, we are proposing to exploit scheduling to improve QoD. Specifically, we are presenting a new policy for scheduling multi- ple continuous queries with the objective of maximizing the fresh- ness of the output data streams and hence the QoD of such outputs. The proposed Freshness-Aware Scheduling of Multiple Continuous Queries (FAS-MCQ) policy decides the execution order of contin- uous queries based on each query’s properties (i.e., cost and se- lectivity) as well the properties of the input update streams (i.e., variability of updates). Our experimental results have shown that FAS-MCQ can increase freshness by up to 50% compared to exist- ing scheduling policies used in DSMSs. 1. INTRODUCTION Data streams processing is an emerging research area that is driven by the growing need for monitoring applications. A monitoring application continuously processes streams of data for interesting, significant, or anomalous events. Monitoring applications have been used in important business and scientific information systems, for example, monitoring network performance, real-time detection of disease outbreaks, tracking the stock market, performing envi- ronmental monitoring via sensor networks, providing personalized and customized Web pages. For example, consider the University of Pittsburgh’s Realtime Out- break of Disease Surveillance System (http://rods.health.pitt.edu). Such a system receives data from different sources (e.g., hospitals, clinics, pharmacies, etc.) and integrates it together in order to de- tect correlations or abnormal events. In the event of detecting a disease outbreak, CDC and health departments are notified to start mobilizing their resources. ∗ This is an extended version of our paper “Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web”, which ap- pears in the Proceedings of the 8th ACM WebDB Workshop (June 2005, held in conjuction with SIGMOD 2005). Efficient employment of monitoring applications needs advanced data processing techniques that can support the continuous process- ing of rapid unbounded data streams. Such techniques go beyond the capabilities of traditional store-then-query Data Base Manage- ment Systems. This need has led to a new data processing paradigm and created a new generation of data processing systems, called Data Stream Management Systems (DSMS) that support the execu- tion of continuous queries (CQ) on data streams [23]. Aurora [4], STREAM [18], TelegraphCQ [5], Tribeca [21], Gigas- cope [10], Niagara [7] and Nile [11] are examples of current pro- totype DSMSs. In such systems, each monitoring application reg- isters a set of CQs, where a CQ is continuously executed with the arrival of new relevant data (Figure 1). In the Real-time Outbreak of Disease System (RODS) example, the health officials register queries for tracking specific indicators of disease outbreaks. As another example, a user might register a query to monitor news about tsunamis. Thus, as new articles arrive into the system, all the Tsunami-related ones have to be propagated to that user. As such, the arrival of new updates triggers the execution of a set of corresponding queries, since portions of the new updates may be relevant to different queries. The output of such a frequent execu- tion of a continuous query is what we call an output data stream (see Figure 1). In this particular example, an output data stream can be used to continuously update a user’s personalized Web page where a user logs on and monitors updates as they arrive. It can also be used to send email notifications to the user when new results are available. As the amount of updates on the input data streams increases and the number of registered queries becomes large, advanced query processing techniques are needed to efficiently synchronize the re- sults of the continuous queries with the available updates. Effi- cient scheduling of updates is one such query processing technique which successfully improves the Quality of Data (QoD) provided by interactive systems. In this paper, we are focusing on scheduling continuous queries for improving the QoD of output data streams. QoD can be measured in different ways, one of which is freshness. Freshness, as well as scheduling policies for improving freshness, has been studied in the contexts of replicated databases [8, 9], de- rived views [13], and distributed caches [19]. To the best of our knowledge, our work is the first to study the problem of freshness in the context of data streams. In this respect, our work can be re- garded as complementary to the current work on the processing of continuous queries, which considers only Quality of Service met- rics like response time and throughput (e.g., [7, 20, 3, 5, 1]).