An Improved Scheme for Side Information Generation in a Distributed Video Coding System Manal Jalloul and Mohamad Adnan Al-Alaoui Electrical and Computer Engineering Department American University of Beirut Beirut, Lebanon Lina Karam Electrical and Computer Engineering Department Arizona State University Arizona, USA Abstract—Side information (SI) generation plays a key-role in determining the performance of the Distributed Video Coding System (DVC). Current approaches to DVC rely on motion- compensated interpolation (MCTI) to generate at the decoder the SI which is an estimation of the frame being decoded. This work presents a novel MCTI algorithm. In the proposed scheme, motion estimation utilizes only the low-frequency content of the key frames whereas motion compensation is performed using the entire frequency range. The motivation behind this approach is that MCTI was found to provide better interpolation results in the low frequency range than in the high frequency range. An improved variable-size block matching algorithm (VSBMA) with 3-D recursive search (3DRS) is used for motion estimation. A novel motion- compensated enhancement (MCE) algorithm is also proposed. Simulation results are presented to illustrate the performance of the proposed scheme. Keywords: Distributed Video Coding, Side Information generation, 3-D recursive search, variable-size block matching I. INTRODUCTION In recent years, with emerging applications such as wireless low-power surveillance and multimedia sensor networks, wireless PC cameras and mobile camera phones, the traditional digital video coding paradigm, represented by the standardization efforts of ITU-T VCEG and ISO/IEC MPEG, is being challenged. These applications have very different requirements than those of traditional video delivery systems. For some applications, it is essential to have a low power consumption both at the encoder and decoder sides, e.g. in mobile camera phones. In other types of applications, notably when there is a high number of encoders and only one decoder, e.g. surveillance, low cost encoder devices are necessary. In order to fulfill these requirements, it is essential to have a low-power and low- complexity encoder device, possibly at the expense of a higher complexity decoder. Another important goal is to achieve a coding efficiency similar to that of traditional video coding schemes, i.e. the shift of complexity from the encoder to the decoder should ideally not compromise the coding efficiency. Several results from Information Theory (Slepian-Wolf in 1973 [1], and Wyner-Ziv in 1976 [2]) suggest that this problem can be solved by exploiting source statistics, partially or totally, at the decoder. These results can be used in the design of a new type of coding algorithms, the so- called Distributed Video Coding (DVC) solutions. A video is intraframe encoded and interframe decoded in this system and thus the motion estimation block is shifted from the encoder to the decoder side to generate what is referred to as Side Information (SI) needed for efficient decoding. In current approaches to Distributed Video Coding [3- 10], a subset of frames, called “Key frames”, is encoded and decoded using conventional video codec. The frames between the key frames are “Wyner-Ziv frames” which are intraframe-encoded but interframe-decoded. Quantization then channel coding is performed on the Wyner-Ziv frames. The resulting systematic bits are discarded and only the parity bits are transmitted to the decoder. At the decoder side, for each Wyner-Ziv frame the decoder generates side information from previously decoded key frames. The side information is nothing but an estimate of the Wyner-Ziv frame. The channel decoder combines the side information and the parity bits to recover the video stream. In this framework, the side information is considered a distorted version of the source information and parity bits are used to correct the side information to the source information. If the decoder detects errors, it requests additional parity bits from the encoder buffer through a feedback channel. Finally, the decoded bitstream combined with the side information are used to reconstruct the Wyner-Ziv frame. In a DVC system, the key task to exploit source statistics is carried out in the SI generation process to produce an estimate of the WZ frame being decoded. Therefore, SI has a significant influence on the RD performance of DVC. Indeed, more accurate SI implies that fewer parity bits are requested from the encoder to decode the WZ frame. Thus, we can improve the compression performance of the system by improving the SI generation scheme. The currently most adopted approach for SI generation uses motion compensated temporal interpolation (MCTI). In basic MCTI [12], the video frames are divided into a fixed number of square blocks, and for each block a full search (FS) is conducted within a predefined window of an adjacent frame to find the best match and generate motion vectors that are used later for frame interpolation in the interframe decoder of the DVC codec. This scheme suffers from two major drawbacks. First, because of the block- based FS that is conducted, the computational requirements are very high and the resulting motion vectors present low spatial coherence. Second, the quality of the resulting motion vectors is dependent on the assumption that each