Please cite this article in press as: Lee, C.-H., et al., Effective processing of continuous group-by aggregate queries in sensor networks. J. Syst. Software (2010), doi:10.1016/j.jss.2010.08.049 ARTICLE IN PRESS G Model JSS-8576; No. of Pages 15 The Journal of Systems and Software xxx (2010) xxx–xxx Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Effective processing of continuous group-by aggregate queries in sensor networks Chun-Hee Lee a , Chin-Wan Chung a, , Seok-Ju Chun b a Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 305-701, Republic of Korea b Department of Computer Education, Seoul National University of Education, Seoul 137-742, Republic of Korea article info Article history: Received 16 April 2009 Received in revised form 18 June 2010 Accepted 18 August 2010 Available online xxx Keywords: Sensor network Group-by aggregate query Haar wavelet Two-phase collection abstract Aggregate queries are one of the most important queries in sensor networks. Especially, group-by aggre- gate queries can be used in various sensor network applications such as tracking, monitoring, and event detection. However, most research has focused on aggregate queries without a group-by clause. In this paper, we propose a framework, called the G-Framework, to effectively process continuous group-by aggregate queries in the environment where sensors are grouped by the geographical location. In the G-Framework, we can perform energy effective data aggregate processing and dissemination using two-dimensional Haar wavelets. Also, to process continuous group-by aggregate queries with a HAVING clause, we divide data collection into two phases. We send only non-filtered data in the first collection phase, and send data requested by the leader node in the second collection phase. Experimental results show that the G-Framework can process continuous group-by aggregate queries effectively in terms of energy consumption. © 2010 Elsevier Inc. All rights reserved. 1. Introduction Sensor networks consist of small sensors which have comput- ing and communication facilities. With the advancement of sensor technology, sensors are becoming smaller and more powerful. Moreover, as the price of a sensor becomes low, we expect that a large number of sensors will be used in various sensor network applications. For example, a volcanologist can use a sensor network to mon- itor a dangerous active volcanic area. Low-priced sensors can be scattered over the dangerous area from an airplane. Such sen- sors become a sensor network and monitor the volcano without humans’ help. However, sensors have very limited resources (e.g., memory, computation, communication and energy). Among vari- ous resources, energy is one of the very important resources since the battery replacement is difficult or impossible in such environ- ments. In sensor networks, since individual sensor readings are raw data, there are many applications using aggregate values. In many cases, the aggregate values of many regional areas are preferred to the aggregate value of the whole area since the aggregate value of the whole area does not provide the detailed information. That is, group-by aggregate queries are useful in sensor networks. There- fore, in this paper, we consider continuous group-by aggregate queries. Due to many shortcomings of the current technology, it is difficult to manage a large number of sensors. As one of the effective Corresponding author. Tel.: +82 42 350 3537; fax: +82 42 350 7737. E-mail addresses: leechun@islab.kaist.ac.kr (C.-H. Lee), chungcw@kaist.edu (C.-W. Chung), chunsj@snue.ac.kr (S.-J. Chun). methods to deal with many sensors, we can use clustering in sensor networks (Heinzelman et al., 2002; Younis and Fahmy, 2004). Since sensor readings have spatial correlations, spatial clustering of sen- sors has many benefits. Therefore, we deal with group-by aggregate queries in the environment where sensors are grouped (clustered) by the geographical location. A group-by aggregate query may have a HAVING clause which is a predicate for the aggregation of the group. The queries we consider in this paper are shown in Fig. 1. However, we focus on the query in Fig. 2(a) since processing of queries in Fig. 1 can be extended from the processing of the query in Fig. 2(a). Also, the G-Framework can process local predicates in a straightforward method. Each node checks whether sensor read- ings satisfy local predicates and makes the bitmap. Then, the node sends only the satisfied data and the bitmap. Therefore, we will not mention local predicates in this paper for convenience of explana- tion. Many papers proposed the processing of aggregate queries (Madden et al., 2002; Fan et al., 2002; Considine et al., 2004; Nath et al., 2004; Shrivastava et al., 2004; Deligiannakis et al., 2004; Sharaf et al., 2003, 2004). However, most of them do not consider group- by aggregate queries. Although some papers deal with processing group-by aggregate queries, they do not focus on processing group- by aggregate queries by the geographical location. In this paper, we focus on processing those queries. They can be used in many sen- sor networks applications such as tracking, monitoring, and event detection. To process them, we assume the following: Sensors are grouped according to the geographical location. See Fig. 2(b). A group consists of a leader node and member nodes. A leader node and member nodes are connected in one hop (the 0164-1212/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2010.08.049