The Design of an Acquisitional Query Processor For Sensor Networks * Samuel Madden, Michael J. Franklin, and Joseph M. Hellerstein Wei Hong {madden,franklin,jmh}@cs.berkeley.edu wei.hong@intel-research.net UC Berkeley Intel Research, Berkeley ABSTRACT We discuss the design of an acquisitional query processor for data col- lection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often data is physically acquired (sampled) and delivered to query processing operators. By focusing on the locations and costs of acquiring data, we are able to significantly reduce power consumption over traditional passive systems that assume the a priori existence of data. We discuss simple extensions to SQL for controlling data acquisition, and show how acquisitional issues influence query op- timization, dissemination, and execution. We evaluate these issues in the context of TinyDB, a distributed query processor for smart sensor devices, and show how acquisitional techniques can provide significant reductions in power consumption on our sensor devices. 1. INTRODUCTION In the past few years, smart sensor devices have matured to the point that it is now feasible to deploy large, distributed networks of such sen- sors [42, 23, 37, 8]. Sensor networks are differentiated from other wire- less, battery-powered environments in that they consist of tens or hun- dreds of autonomous nodes that operate without human interaction (e.g. configuration of network routes, recharging of batteries, or tuning of pa- rameters) for weeks or months at a time. Furthermore, sensor networks are often embedded into some (possibly remote) physical environment from which they must monitor and collect data. The long-term, low- power nature of sensor networks, coupled with their proximity to physi- cal phenomena, lead to a significantly altered view of software systems than that of more traditional mobile or distributed environments. In this paper, we are concerned with query processing in sensor net- works. Researchers have noted the benefits of a query processor-like in- terface to sensor networks and the need for sensitivity to limited power and computational resources [27, 33, 41, 48, 34]. Prior systems, how- ever, tend to view query processing in sensor networks simply as a power-constrained version of traditional query processing: given some set of data, they strive to process that data as energy-efficiently as possi- ble. Typical strategies include minimizing expensive communication by applying aggregation and filtering operations inside the sensor network – * This work has been supported in part by the National Science Founda- tion under ITR/IIS grant 0086057, ITR/IIS grant 0208588, ITR/IIS grant 0205647, ITR/SI grant 0122599, and by ITR/IM grant 1187-26172 , as well as research funds from IBM, Microsoft, and the UC MICRO pro- gram. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 2003, June 9-12, San Diego, CA Copyright 2003 ACM 1-58113-634-X/03/06 ...$5.00. strategies that are familiar from push-down techniques from distributed query processing that emphasize moving queries to data. In contrast, we present acquisitional query processing (ACQP), where we focus on the significant new query processing opportunity that arises in sensor networks: the fact that smart sensors have control over where, when, and how often data is physically acquired (i.e. sampled) and de- livered to query processing operators. By focusing on the locations and costs of acquiring data, we are able to significantly reduce power con- sumption compared to traditional passive systems that assume the a pri- ori existence of data. Acquisitional issues arise at all levels of query processing: in query optimization, due to the significant costs of sam- pling sensors; in query dissemination, due to the physical co-location of sampling and processing; and, most importantly, in query execution, where choices of when to sample and which samples to process are made. Of course, techniques proposed in other research on sensor and power-constrained query processing, such as pushing down predicates and minimizing communication are also important alongside ACQP and fit comfortably within its model. We have designed and implemented an ACQP engine, called TinyDB (for more information on TinyDB, see [35]), which is a distributed query processor that runs on each of the nodes in a sensor network. TinyDB runs on the Berkeley Mica mote platform, on top of the TinyOS [23] op- erating system. We chose this platform because the hardware is readily available from commercial sources [13] and the operating system is rel- atively mature. TinyDB has many of the features of a traditional query processor (e.g. the ability to select, join, project, and aggregate data), but, as we will discuss in this paper, also incorporates a number of other features designed to minimize power consumption via acquisitional tech- niques. These techniques, taken in aggregate, can lead to orders of mag- nitude improvement in power consumption and increased accuracy of query results over non-acquisitional systems that do not actively control when and where data is collected. We address a number of ACQP-related questions, including: 1. When should samples for a particular query be taken? 2. What sensor nodes have data relevant to a particular query? 3. In what order should samples for this query be taken, and how should sampling be interleaved with other operations? 4. Is it worth expending computational power or bandwidth to pro- cess and relay a particular sample? Of these issues, question (1) is unique to ACQP. The remaining ques- tions can be answered by adapting techniques that are similar to those found in traditional query processing. Notions of indexing and opti- mization, in particular, can be applied to answer questions (2) and (3), and question (4) bears some similarity to issues that arise in stream pro- cessing and approximate query answering. We will address each of these questions, noting the unusual kinds of indices, optimizations, and ap- proximations that are required in ACQP under the specific constraints posed by sensor networks. Figure 1 illustrates the basic architecture that we follow throughout this paper – queries are submitted at a powered PC (the base station),