VARIANCE-AWARE DISTORTION ESTIMATION FOR WIRELESS VIDEO COMMUNICATIONS Yiftach Eisenberg, Fan Zhai, Carlos E. Luna, Thrasyvoulos N. Pappas, Randall Berry, and Aggelos K. Katsaggelos Northwestern University, Department of ECE, Evanston, IL 60208, USA E-mail: {yeisenbe, fzhai, carlos, pappas, rberry, aggk}@ece.northwestern.edu ABSTRACT The problem of encoding and transmitting a video sequence over a wireless channel is considered. Our objective is to minimize the end-to-end distortion while using a limited amount of transmission energy and delay. In our approach, we jointly adapt the source-coding parameters and transmission power per packet. We introduce the concept of “Variance-Aware Distortion Estimation” (VADE), and present a framework for controlling both the expected value and the variance of the end- to-end distortion. This framework is based on knowledge of how the video is compressed, the probability of packet loss, and the concealment strategy. To the best of our knowledge, this paper is the first to address the trade-off between the mean and variance of the end-to-end distortion. Experimental results demonstrate the potential of the proposed approach. 1. INTRODUCTION Transmission energy is a critical resource in wireless video communications [1]. Since most users of a wireless network are mobile, they must rely on a battery with a limited energy supply. Efficiently utilizing transmission energy can extend the lifetime of this battery, decrease the level of interference between users, as well as increase the overall network capacity. This paper builds on our prior work, some of which can be found in [2]. Our goal is to achieve the best video quality using a limited amount of transmission energy and delay. To accomplish this we jointly consider error resilience and concealment techniques at the source-coding level, and transmission power management at the physical layer. In this way, the transmission power/energy is used as an unequal error protection (UEP) mechanism. In most video communication systems, the transmitter does not know exactly which packets are lost, but instead has an estimate of the probability of packet loss. Thus, from the point of view of the transmitter, the distortion at the receiver is a random variable. Recent work on resilient video coding for packet loss networks has primarily focused on minimizing the expected value of the end-to-end distortion [2,3,4,5,6]. A common feature among these works is that they all measure video quality by the expected distortion, where the expectation is computed with respect to all the possible packet loss patterns. Several methods have been proposed for calculating the expected distortion. These methods can be divided into two general categories. The first is optimal per-pixel estimation methods, such as [3,4,2], that can be used to accurately calculate the expected value of the distortion under certain conditions. (a) (b) (c) Fig. 1. (a) Expected frame, (b and c) two loss realizations. The second category consists of methods that use models to estimate the expected distortion [5,6]. Model based methods are useful when either computation power is limited or closed form expressions for the expected distortion are not known. The above is only a small sample of the work in this area. At the receiver, the end user sees only one of the many possible reconstructed sequences, depending on which packets are lost. Therefore, the actual distortion at the receiver is not equal to the expected distortion. To illustrate this point, consider the images shown in Fig. 1. While the expected reconstructed frame (averaged over all possible loss realizations) may be reasonable, the quality at the receiver may vary greatly based on which packets are lost. Therefore, in this paper we argue that the variance of the end-to-end distortion should also be considered when characterizing video quality in lossy packet networks. We introduce the concept of “variance-aware distortion estimation” (VADE), and present a framework for controlling both the expected value and the variance of the end-to-end distortion. 2. SYSTEM MODEL Consider a video communication system where the video is encoded using a block-based motion-compensated technique (e.g., H.263, MPEG-4). Each frame is divided into slices that are comprised of consecutive Macro-Blocks (MBs). Each slice is independently decodable, i.e., the decoding of one slice is not affected by the loss of other slices in the same frame. Losses in other frames may cause temporal error propagation due to inter- frame prediction. After a slice is encoded, it is transmitted across a wireless channel as a separate packet. In the following, slice and packet will be used interchangeably. Let M be the number of packets in a given frame and k be the packet index. For each packet, source-coding parameters, such as the coding mode (intra/inter/skip) and quantization step-size for each MB are specified. We use μ k to denote the source-coding parameters for the kth packet, and μ μ μ = {μ 1 …μ M } to denote the coding parameters for all the packets in a frame. The number of bits used to encode the kth packet, B k , is a function of μ k ; we use B k (μ k ) to explicitly indicate this dependency. In addition to μ k , we assume that the transmission power for each packet, P k , can be adjusted. We use P = {P 1 …P M } to denote the transmission power for all the packets in a frame. 0-7803-7750-8/03/$17.00 ©2003 IEEE. ICIP 2003