Three-level Processing of Multiple Aggregate Continuous Queries Shenoda Guirguis 1 , Mohamed A. Sharaf 2 , Panos K. Chrysanthis 1 , Alexandros Labrinids 1 1 Department of Computer Science University of Pittsburgh {shenoda, panos, labrinid}@cs.pitt.edu 2 School of Information Technology and Electrical Engineering The University of Queensland m.sharaf@uq.edu.au Abstract—Aggregate Continuous Queries (ACQs) are among the most common Continuous Queries across all classes of monitoring applications and typically have a high execution cost. As such, optimizing the processing of ACQs is imperative for Data Stream Management Systems to reach their full potential. Existing multiple ACQs optimization schemes focus on ACQs with varying window specifications and pre-aggregation filters and assume a processing model where each ACQ is computed as a final-aggregation of a sub-aggregation. In this paper, we propose a novel processing model for ACQs, called TriOps, that minimizes the repetition of operations at the sub-aggregation level, and a new multiple ACQs optimizer, called TriWeave, that is TriOps-aware. We analytically and experimentally demon- strate the performance gains of our proposed schemes, showing their superiority over alternative schemes. Finally, we generalize TriWeave to incorporate the classical subsumption-based multi- query optimization techniques for handling overlapping group-by attributes. I. I NTRODUCTION Streams Aggregation. Aggregate Continuous Queries (ACQs) are among the most common Continuous Queries across all classes of monitoring applications (e.g., [16], [15], [21]). Typically, many ACQs monitor the same data input stream. In fact, more than often, these ACQs are also comput- ing the exact same aggregate function, but may have slightly different specifications, such as the window specifications, pre- aggregation filters (i.e., predicates), and group-by attributes. For example, a network monitoring application could em- ploy three ACQs to monitor the IP traffic, all of which could compute the COUNT of incoming packets. While the first ACQ could report the count in the last minute, updated every five seconds, the second and third ACQs could report the count in the last minute, to be updated every half minute. Further, the first ACQ might be interested in the count of IP traffic originating from a specific source, i.e., have a predicate that the source IP has a certain value. The second and third ACQs, on the other hand, might be counting all received packets, but have them grouped by source IP and destination IP, respectively. While many ACQs, like the three ACQs in our example above, compute the same aggregate function over the same input data steam, they have different specifications, depending on the user and the purpose of the ACQ. Motivation. Given the cost and commonality of ACQs, optimizing their processing is crucial in order for Data Stream Management Systems (DSMSs) (e.g., [2], [4], [5], [13], [6], [7], [27], [25], [26]) to achieve the scalability needed to handle the typical large volumes of data and large numbers of ACQs. This need has motivated the development of several tech- niques for the efficient processing of ACQs, which could be broadly classified into techniques for: 1) the implementation of the continuous aggregation operator, and 2) the multi-query optimization of multiple ACQs. Under the first set of techniques (i.e., operator implemen- tation), partial aggregation has been proposed to minimize the repeated processing of overlapping data windows within a single aggregate (e.g., [17], [18], [16], [9]). In particular, partial aggregation aims at processing each input tuple only once and assembling the final aggregate value from a set of partial aggregate values. Specifically, ACQ processing is modeled as a two-level (i.e., two-operator) query execution plan: in the first level a sub-aggregate function is computed over the data stream generating a stream of partial aggregates, whereas in the second level a final-aggregate function is computed over those partial aggregates. We refer to this two- level aggregation processing model as TwoOps hereafter. Under the second set of techniques (i.e., multi-query opti- mization), the general principle is to minimize (or eliminate) the repeated processing of overlapping operations across mul- tiple ACQs. This repetition occurs as a result of processing the same data by different queries, which exhibit an overlap in at least one of the following specifications: 1) predicate conditions, 2) group-by attributes, or 3) window settings. On one hand, leveraging overlaps in predicate conditions and group-by attributes across different queries has been the focus of intensive research on traditional multi-query optimization, which typically relies on the detection of com- mon subexpressions. On the other hand, the introduction of window-based continuous queries for the processing of statefull operators (i.e., joins and aggregates) over unbounded data streams has motivated recent research on the shared processing of queries with overlapping windows. For instance, the Shared Time Slices technique [16] has been proposed to share the processing of multiple ACQs with varying windows. It has also been extended into Shared