Enhance Exploring Temporal Correlation for Data Collection in WSNs Ngoc Duy Pham, Trong Duc Le, and Hyunseung Choo School of Information and Computer Engineering Sungkyunkwan University, Korea Email: {phmngocduy, letrongduc}@skku.edu, choo@ece.skku.ac.kr Abstract—Continuous data collection applications in wireless sensor networks require sensor nodes to continuously sample the surrounding physical phenomenon and then return the data to a processing center. Battery-operated sensors have to avoid heavy use of their wireless radio by compressing the time series sensed data instead of transmitting it in raw form. One of the most commonly used compacting methods is piecewise linear approximation. Previously, Liu et al. proposed a greedy PLAMLiS algorithm to approximate the time series into a number of line segments running in Θ(n 2 logn) time, however this is not appropriate for processing in the sensors. Therefore, based on our study we propose an alternative algorithm which obtains the same result but needs a shorter running time. Based on theoretical analysis and comprehensive simulations, it is shown that the new proposed algorithm has a competitive computational cost of Θ(nlogn) as well as reducing the number of line segments and so it can decrease the overall radio transmission load in order to save energy of the sensor nodes. I. I NTRODUCTION One of the main objectives of wireless sensor networks (WSNs) is to collect environmental sensor reading data [1], that is, each sensor node periodically collects local measures of interest such as illumination, temperature and humidity, and then transmits them back to the sink node. Typically, each node measures the environmental parameters at a fixed interval of time, and the time-ordered sequence of samples constitutes a time series. Because of the nature of the physical phenomenon, there is significant temporal correlation among the time series of the sensor readings, meaning that the sensed data is quite similar during a short period of time and so future values can be predicted based on the previous measurements. The correlations can be captured by mathematical models such as wavelet transforms or linear models (see example in Fig. 1). Therefore, the time series can be approximated by using a suitable mathematical model, and the obtained result is the amount of approximating data, which is usually much lower than the volume of the whole data series. Transferring compressed data, instead of raw data, can significantly help in reducing the energy consumption of the communication in network [10]. There already exists a number of research efforts to exploit temporal correction [2], [9], [11], and to classify, cluster and index the time series [5], [7]. Some of these are meant to be executed on a server, which has enough computational resources for mining large time series online or offline. Unfortunately, sensor nodes in WSNs are very limited in (a) (b) Fig. 1. The time series (a) and its piecewise linear presentation (b). computational and energy resources, making existing methods not very effective. One of the most noteworthy approaches is the Energy Efficient Data Collection (EEDC) framework proposed by Liu et al. [3], [4]. A greedy approximation technique was introduced, which has the computation cost of Θ(n 2 logn), where n is the length of the time series. In general, the purpose of the algorithm is to find the minimum number of line segments to approximate the time series such that the difference between any approximation value and the actual value is less than a given error bound ǫ. However, EEDC, and most existing studies, only investigate the theoretical aspects of the correlation, or provide an approximating algorithm at the sensor nodes using a relatively high computing cost. In this paper, to exploit the temporal correlation, an al- ternative piecewise linear approximation algorithm has been adopted, to approximate the time series by a sequence of line segments. The advantage of this approach is that it leads to a shorter running time of Θ(nlogn). Moreover, from the experimental results the proposed algorithm requires less number of line segments to be archived. Accordingly, for continuous data collection applications in WSNs, the new algorithm saves energy in the computation and communication needed at every individual sensor node. The remainder of this paper is organized as follows: section II briefly discusses previous work in this area, the new algorithm to approximate the time series of sensed data is presented in section III, Section IV describes the performance evaluations using simulation results, and section V presents the conclusions of the research. II. EXPLOITING TEMPORAL CORRELATION In continuous data collection applications, sensor nodes obtain the measurement samples periodically and send the collected readings to a buffer. When the buffer is full, the nodes treats the data stream as a time series and transmits it back to the sink (processing center). However, transmitting 204 978-1-4244-2379-8/08/$25.00 (c)2008 IEEE