An Adaptive and Composite Spatio-Temporal Data Compression Approach for Wireless Sensor Networks Azad Ali, Abdelmajid Khelil, Piotr Szczytowski and Neeraj Suri Department of CS, TU Darmstadt, Germany {azad, khelil, szczytowski, suri}@deeds.informatik.tu-darmstadt.de ABSTRACT Wireless Sensor Networks (WSN) are often deployed to sam- ple the desired environmental attributes and deliver the ac- quired samples to the sink for processing, analysis or simula- tions as per the application needs. Many applications stipu- late high granularity and data accuracy that results in high data volumes. Sensor nodes are battery powered and send- ing the requested large amount of data rapidly depletes their energy. Fortunately, the environmental attributes (e.g., tem- perature, pressure) often exhibit spatial and temporal cor- relations. Moreover, a large class of applications such as scientific measurement and forensics tolerate high latencies for sensor data collection. Accordingly, we develop a fully distributed adaptive technique for spatial and temporal in- network data compression with accuracy guarantees. We ex- ploit the spatio-temporal correlation of sensor readings while benefiting from possible data delivery latency tolerance to further minimize the amount of data to be transported to the sink. Using real data, we demonstrate that our proposed scheme can provide significant communication/energy sav- ings without sacrificing the accuracy of collected data. In our simulations, we achieved data compression of up to 95% on the raw data requiring around 5% of the original data to be transported to the sink. Categories and Subject Descriptors C.2.1 [Communication Networks]: Network Architec- ture and Design —network communications, wireless com- munication ; I.6.5 [Simulation and Modeling]: Model De- velopment—Modeling methodologies General Terms Algorithms, Design, Measurement, Performance Keywords spatial and temporal correlations, hierarchical clustering, energy efficiency, modeling, approximation, accuracy Research supported in part by HEC and DFG GRK 1362 (TUD GKMM). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MSWiM’11, October 31–November 4, 2011, Miami, Florida, USA. Copyright 2011 ACM 978-1-4503-0898-4/11/10 ...$10.00. 1. THE PROBLEM AND THE APPROACH In WSN deployments, sensor nodes are often distributed over the monitoring area for unattended environmental mon- itoring or for supervisory functions. The typical WSN func- tionality being (i) local event detection and reporting it to the sink, and (ii) continuous data collection by sampling the environment and sending the samples to the sink. In this pa- per, we deal with continuous data collection. Applications utilize continuously collected data for (a) real-time decision making, such as surveillance, or (b) delay-tolerant process- ing such as modeling, analysis [17] and inference [3]. In this work, we develop adaptive modeling algorithms that exploit the delay-tolerance of the data collection to maximize data compression. As examples, various scientific applications, such as, volcano monitoring [21] or eco-systems [17] [5], re- quire detailed ambient data with high spatio-temporal sam- pling resolution for fine-granular understanding of the phys- ical processes. For such applications, a WSN is essentially a spatio-temporal sampling system. High spatial resolution is reachable if we sample at a granular node level. Simi- larly, high temporal resolution is achieved by transmitting the samples repeatedly. Consequently, large data volumes need to be sent to the sink. Sending a message is a costly en- ergy consuming operation and fetching more data from the network results into more message transmissions and higher energy consumption. Sensor nodes have limited computa- tional capabilities and are powered by finite energy sources. Accordingly, highly accurate continuous data collection is a challenging problem in WSN given their energy, communi- cation and computational constraints. Fortunately, WSNs exhibit naturally high redundancy in spatial sampling due to the redundant sensor node deploy- ment for connectivity and failure tolerance. Often, the sam- pled attributes are also temporally compressible due to tem- poral correlation [18]. Redundant deployment and tempo- ral correlation allow a significant reduction of communica- tion overhead through spatio-temporal compression. Vari- ous WSNs deployed for scientific monitoring [21], [5], [17] continuously harvest data for modeling, analysis and sim- ulations. They generally tolerate a certain data collection latency. Hence, for such applications real-time data acquisi- tion does not have precedence over the quality and quantity of the data. This latency-tolerance in reporting the data to the sink opens up fundamental design flexibility in data collection to significantly improve WSN energy efficiency. We propose a scheme to exploit the redundancies present in WSN along with the data collection delay-tolerance to maximize spatio-temporal data compression.