The 10 th Conference for Informatics and Information Technology (CIIT 2013) ©2013 Faculty of Computer Science and Engineering COMPARISON OF DIFFERENT DATA PREDICTION METHODS FOR WIRELESS SENSOR NETWORKS Biljana Risteska Stojkoska Kliment Mahoski Faculty of Computer Science and Engineering Faculty of Computer Science and Engineering Skopje, Macedonia Skopje, Macedonia ABSTRACT Different data reduction strategies have been developed in order to reduce the energy consumption in wireless sensor networks (WSN). Most of them reduce the amount of sent data by predicting the measured values both at the source and the sink, requiring transmission only if a certain reading differs by a given margin from the predicted values. The subject of this paper is comparison of a few different techniques for prediction of time series data in WSN. While these strategies often provide great reduction in power consumption, they don’t need a priori knowledge of the explored domain in order to correctly model the expected values. I. INTRODUCTION Distributed WSN provide the ability to make temporal and spatial progression of the quantitis they measure. If the nodes report sensed data at each interval, it will vastly reduce the network lifetime and will create sufficient communication overhead. There are several techniques that have been developed to overcome these problems, i.e. to lower the communication overhead and to increase the energy savings. Data-reduction techniques can be basically divided into three main groups: data compression, data prediction and in- network processing [1]. Data compression is applied to reduce the amount of information sent by source nodes. This scheme involves coding strategy used to represent data regardless of their semantics and is very suitable if the WSN application doesn’t require the most recent measurements. In-network processing performs data aggregation while data is routed towards the sink node. This paradigm aims to transform the raw data into less voluminous refined data using summarization functions (minimum, maximum and average). For applications that require original and accurate measurements, such a summarization may be inappropriate since it brings loss of the accuracy [2]. Data prediction techniques usually maintain two instances of a prediction model, one residing at the sink and the other at the sensor. To avoid a rapid deterioration in the predicted values, such approaches need to periodically validate and update their models. Data prediction techniques can be divided into three subclasses: stochastic approaches, time series forecasting and algorithmic approaches. The last are application-specific and usually apply some heuristics about the domain they explore. Stochastic approaches are used when sensed phenomena can be modeled with probability density function. These algorithms provide acceptable predictions but usually are inappropriate due to its computational overhead. Data prediction models for WSN are those based on time series forecasting. Moving Average (MA), Autoregressive (AR) or Autoregressive Moving Average (ARMA) models are simple, easy for implementation and provide acceptable accuracy [3][4]. In this paper, we investigate and compare time-series forecasting techniques for WSN based on these three algorithms. The rest of the paper is organized as follows: the next section presents a brief overview of related work. The third section of this paper describes the process models used for data prediction - MA, AR and ARMA. The fourth chapter covers the simulation results. Finally, we conclude this paper in section five. II. RELATED WORK Time series forecasting in WSN is still not enough explored, beside the attractiveness of WSN in the last decade. Only a few well known techniques from time series analyses have been implemented and appropriately evaluated on different WSN datasets. The most popular paradigm is Dual Prediction Scheme (DPS) [3][5][6][7] (formerly known as Dual Kalman Filter). Here, each node runs a filter (or a model) that estimates the next measurement. The sink (or the base station) runs exactly the same models for each sensor in the network and makes the same predictions. Since the sensor makes measurements of the sensed quantity, it can check whether the predicted value differs from the sensed value above the predefined threshold   . If the difference is below the threshold, both the sensor and the sink accept the predicted value and store it in the memory instead of measured value. Otherwise, the sensor sends the actual value to the sink node. Both the sensor and sink use this value and simultaneously estimate the prediction model and update the filter weights. Romer and Santini in [5] choose Least Mean Square (LMS) over Kalman Filter since it doesn’t require a priori knowledge of the desired measurements, which implies that the sink and the sensors don’t need to agree on a predefined model. In [6][7], the authors propose a modification of LMS that uses variable step size parameter for fine tuning the filter weights. Le Borgne and Santini in [3] present a general framework for DPS in which sensor nodes using racing mechanism [8] autonomously select prediction model among K candidate prediction models: constant prediction model (CM) and AR models of orders 1–5. The results obtained from 14 different