In-network Outlier Cleaning for Data Collection in Sensor Networks Yongzhen Zhuang and Lei Chen Hong Kong University of Science and Technology {cszyz, leichen}@cse.ust.hk Abstract Outliers are very common in the environmen- tal data monitored by a sensor network con- sisting of many inexpensive, low fidelity, and frequently failed sensors. The limited battery power and costly data transmission have in- troduced a new challenge for outlier clean- ing in sensor networks: it must be done in- network to avoid spending energy on trans- mitting outliers. In this paper, we propose an in-network outlier cleaning approach, in- cluding wavelet based outlier correction and neighboring DTW(Dynamic Time Warping) distance-based outlier removal. The clean- ing process is accomplished during multi-hop data forwarding process, and makes use of the neighboring relation in the hop-count based routing algorithm. Our approach guarantees that most of the outliers can be either cor- rected, or removed from further transmission within 2 hops. We have simulated a spatial- temporal correlated environmental area, and evaluated the outlier cleaning approach in it. The results show that our approach can effec- tively clean the sensing data and reduce out- lier traffic. 1 Introduction A sensor network is equipped with thousands of inex- pensive, low fidelity motes, which can easily generate sensing errors. The abnormal unreal sensor readings generated in a temporally or permanently failed sensor is called outliers. In many cases, outliers introduce er- rors in sensing queries and sensing data analysis. For example, a Sum query is less accurate if a large value outlier is counted. In addition, transmitting outliers to the sink is useless, adds additional traffic burden to the network, and consumes precious sensor energy without any benefit. Outlier cleaning tries to capture the out- liers, correct or remove them from the data stream. Outlier cleaning in sensor networks is challenging be- cause data are distributed among a large amount of sensors. It is for sure that outlier detection can be conducted centrally after all the data are collected to the sink. However, it is not energy efficient to transmit outliers, especially when the network size is large. For example, if an outlier is routed through a 15-hop path to the sink, the energy used to transmit this 15-hop datum is wasted. Therefore, in-network outlier clean- ing tries to detect outliers during the data collection process as early as possible along the routing path of the data. It either corrects the outlier or removes it from further forwarding. Eventually, an outlier-free data stream is provided to the sensor network appli- cations. In this paper, we propose an in-network outlier cleaning approach for data collection over sensor net- works. We can correct short simple outliers in 0 hop and remove long segmental outliers within 2 hops. We adopt wavelet approximation to correct short, occa- sionally appeared outliers. Since these short outliers are of high frequency, they can be corrected if we use the first few wavelet coefficients to represent the sensing series. An extraordinary advantage of using wavelet representation is that it can greatly reduce the dimension of the sensing data, as a consequence, reduces the energy cost of transmitting these data. If an outlier is a long segmental outlier, we can detect it by comparing its similarity with the neighboring nodes, given the nature that environmental data are spatially correlated [1]. Similarity is measured by Dy- namic Time Warping (DTW) distance, which can cap- ture the shape similarity in the elastic shifting sensing series [2]. The sensing series are routed as before to the sink, using a hop-count based routing algorithm [3]. The detection is conducted within 2 forwarding hops. A sensing series is not forwarded, if it is dis- similar with its network neighbors. Outlier cleaning requires in-network data processing on the individual sensor mote. In sensor networks, it is admitted that data processing is more economical than data trans- mission [4]. The outlier cleaning process adds O(KN) running time on each sensor. In the erroneous sensor network, this energy cost is trivial compared to that of the reduced traffic.