Cost-Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty Zhenyu Liu University of California Los Angeles, CA 90095 vicliu@cs.ucla.edu Ka Cheung Sia University of California Los Angeles, CA 90095 kcsia@cs.ucla.edu Junghoo Cho University of California Los Angeles, CA 90095 cho@cs.ucla.edu ABSTRACT The rapid development in micro-sensors and wireless networks has made large-scale sensor networks possible. However, the wide de- ployment of such systems is still hindered by their limited energy which quickly runs out in case of massive communication. In this paper, we study the cost-efficient processing of aggregate queries that are generally communication-intensive. In particular, we fo- cus on MIN/MAX queries that require both identity and value in the answer. We study how to provide an error bound to such answers, and how to design an “optimal” sensor-contact policy that mini- mizes communication cost in reducing the error to a user-tolerable level. Categories and Subject Descriptors F.0 [Theory of Computation]: General General Terms Theory Keywords MIN/MAX Query Processing, Query Answering with Uncertainty 1. INTRODUCTION Sensor networks have tremendous potential to extend our capa- bility in sensing and interacting with the surrounding environment. The rapid development in low-power micro-sensors and wireless networks has enabled prototype sensor networks to be deployed in controlled environments [1]. However their wide deployment is still hindered by their relatively short lifespan, because sensors typ- ically operate on a small battery and become inoperable once the battery runs out. It is therefore of paramount importance to mini- mize their battery use to extend the lifespan of a sensor network. In a typical setting, sensors communicate through wireless chan- nels due to the ease and low cost of initial deployment. In this sce- nario, the communication among the sensors becomes the principal Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’05 March 13-17, 2005, Santa Fe, New Mexico, USA Copyright 2005 ACM 1-58113-964-0/05/0003 ...$5.00. Central Server Sensor 1 Sensor 2 Sensor 1000 Aggregation queries Answers Figure 1: A sensor network consisting of 1000 precipitation sensors distributed at various locations source of power usage [1], and a number of ongoing research inves- tigates various mechanisms to minimize sensor communication [2, 3, 4, 5, 1]. In this paper, we study how we can process aggregate queries efficiently on a sensor network with minimum communica- tion among the sensors. As our next example illustrates, aggregate queries are one of the most “expensive” queries to support because we potentially have to retrieve the current value from every sensor. Example 1 Consider a sensor network consisting of 1,000 rainfall sensors (Figure 1). The user issues queries to a central server that contacts each sensor to retrieve its current rainfall reading. The current values of all the sensors can be viewed as a 1000-tuple table, Sensors(id, rainfall), where id is the identity of each sensor and rainfall is the current rainfall value. Given this view, the following SQL query computes the maximum rain- fall among the sensors: SELECT MAX(rainfall) FROM Sensors Note that to compute the exact answer to this query, we potentially have to contact all the sensors. This process will incur significant communication overhead on the entire sensor network. ✷ In many applications, however, getting the exact (or 100% accu- rate) answer may not be required, and users may tolerate a certain level of error in order to save the communication cost. In this context, we note that MIN and MAX are the two most dif- ficult queries among the five SQL aggregate functions (MIN, MAX, AVG, SUM, COUNT) to get an “approximate” answer. For AVG, SUM and COUNT queries, all we need is a value in the answer. To process these “value-oriented” queries, we can simply “sample” the current values of a few sensors and get an approximation. Furthermore, it is also possible to estimate the error bound of the approximation using, say, the central limit theorem [6]. In contrast, for MIN or MAX queries on a sensor network, we of- ten want to know not only the value but also the identity of the MIN/MAX sensor. For instance, suppose we want to query the above rainfall sensor network to find out possible ﬂood conditions. To do that, we issue a MAX query to locate the sensor with the heav- iest precipitation. Knowing the identity and knowing the value of the MAX sensor are equally crucial in this task: the sensor’s identity