An Energy-Effcient Framework for Data Aggregation in Wireless Sensor Networks based on Distributed Source Coding Tallal Osama EI-Shabrawy Information Engineering & Technology German University in Cairo Cairo, Egypt tallal.el-shabrawy@guc.edu.eg Abstract-This paper presents a data aggregation and forwarding framework in wireless sensor networks (WSNs) that will help in reducing energy consumption and hence prolong network lifetime. This approach is based on the fact that WSNs usually contain a large number of sensor nodes typically with highly correlated data readings. The proposal is to deploy distributed source coding (DSC) in compressing data messages to reduce transmission energy requirements along with avoiding data redundancy. Specifcally, a DSC construction is proposed to determine the number of bits needed to encode a data message by a sensor node relative to its correlated neighbors without exchanging excessive communication messages among them. Keword-Ener Consumton; Data aggregation and forwardng; Distributed Source Coding; Wreless Sensor Network I. INTRODUCTION The importance of wireless sensor networks (WSNs) over the past few years have increased with the continuous development of sensors technology. While sensors might seem tiny and cheap, they are powerfl devices with computing and communication facilities that are used in many applications such as militarily, health, industry and surveillance. Nonetheless, sensors are energy constrained, which makes energy consumption a critical matter for WSNs. Experiencing excessive large communication messages will eventually drain network energy. As a result, this might shorten the lifetime of WSNs. Hence, these concers might act as obstacles to the expected huge growth of WSN in the near fture. Thus, many protocols have been proposed lately to ofer a reliable practice for maximizing network lifetime via reducing energy consumption. Reducing energy consumption can be done through diferent approaches. One approach is through exploiting the network topology as proposed in [3, 7], or through routing as done by [4 - 5], or by reducing data size and number of transmissions as in [8, 9], respectively, or fnally by data aggregation as explained in [6, 10]. The proposed famework, on the other hand, proposes a data aggregation protocol based on encoding correlated data at sensors before forwarding them to the sink. This approach aims to reduce energy consumption and hence maximize network lifetime. The establishment of this approach is based on deploying distributed source coding (DSC) [1, 2] for encoding aggregated data at sensor node without the need for accessing neighbor's correlated data. To determine the number of encoding bits, a simple correlation tracking technique is employed. The famework was performed and tested in a clustering-based environment that exploits a multiple-hop routing for aggregating 978-1-61284-185-4/11/$26.00 ©2011 IEEE 56 Nora Mohamed Mounir Media Engineering & Technology German University in Cairo Cairo, Egypt nora.mohamed@guc.edu.eg data along the path between two end nodes. This helped in reducing number of data transmissions and enhancing redundancy avoidance, hence reducing energy consumption along with enhancing overall system effciency. The paper is organized as follows. Section II provides technical background about DSC. The famework is presented in section III. Section IV presents simulation results and Section V concludes the work. II. BACKGROUND Distributed source coding as lossless data compression was frst introduced by Slepian and Wolf in 1973 [1]. Slepian and Wolf theorem states that if a pair of correlated discrete random information sources is uniformly distributed, then they can be encoded separately as efcient as the case where the two sources being encoded together. To explain Slepian-Wolf theorem, consider X and Y to be a pair of correlated discrete random information sources. From Shannon's source coding theory [13], the two sources can be compressed sufciently if they are encoded together by a rate given by their joint entropy H( 1 of X and Y. In such case, Y is compressed frst into H( bits/sample, and X is then compressed into H ( 1 bits/sample where both the encoder and the decoder have a complete knowledge of Y. According to Slepian-Wolf, X and Y can be separately compressed providing the same compression effciency as if they were compressed together. This is achievable through compressing X and Y at rates Rx and Ry respectively such that [1] Rx � H(X I Y), Ry � H(Y I X),and Rx + R y � H(X,Y) (1) While proving their theorem, Slepian-Wolf used the random binning concept. Binning refers to dividing all possible outcomes for decoding X into sets of pairs. Then, the transmitted bits fom X present the index of the set which X belongs to. To see how this works, consider the following example, which was demonstrated in [12], where X and Y are two discrete random sets of 3-bits data whose correlation can be described as followS:d H( X,Y)�I, where d H( X ,Y) denotes the Hamming distance between the two sets. Then their entropies H(X and H( equal to 3-bits. For encoding X without accessing Y by the encoder but can be accessed by the decoder, all possible outcomes for decoding X are divided into four sets of pairs (bins). These four bins are Zoo ={OOO) 11 }, Z O I ={OO 1) 1O } . Z I O ={OIO)OI}, Zll ={OIl)OO). Providing that the Hamming distance between those pairs is the maximum (which equal to 3