VARIANCE-AWARE DISTORTION ESTIMATION
FOR WIRELESS VIDEO COMMUNICATIONS
Yiftach Eisenberg, Fan Zhai, Carlos E. Luna,
Thrasyvoulos N. Pappas, Randall Berry, and Aggelos K. Katsaggelos
Northwestern University, Department of ECE, Evanston, IL 60208, USA
E-mail: {yeisenbe, fzhai, carlos, pappas, rberry, aggk}@ece.northwestern.edu
ABSTRACT
The problem of encoding and transmitting a video sequence
over a wireless channel is considered. Our objective is to
minimize the end-to-end distortion while using a limited amount
of transmission energy and delay. In our approach, we jointly
adapt the source-coding parameters and transmission power per
packet. We introduce the concept of “Variance-Aware
Distortion Estimation” (VADE), and present a framework for
controlling both the expected value and the variance of the end-
to-end distortion. This framework is based on knowledge of how
the video is compressed, the probability of packet loss, and the
concealment strategy. To the best of our knowledge, this paper
is the first to address the trade-off between the mean and
variance of the end-to-end distortion. Experimental results
demonstrate the potential of the proposed approach.
1. INTRODUCTION
Transmission energy is a critical resource in wireless video
communications [1]. Since most users of a wireless network are
mobile, they must rely on a battery with a limited energy supply.
Efficiently utilizing transmission energy can extend the lifetime
of this battery, decrease the level of interference between users,
as well as increase the overall network capacity. This paper
builds on our prior work, some of which can be found in [2].
Our goal is to achieve the best video quality using a limited
amount of transmission energy and delay. To accomplish this we
jointly consider error resilience and concealment techniques at
the source-coding level, and transmission power management at
the physical layer. In this way, the transmission power/energy is
used as an unequal error protection (UEP) mechanism.
In most video communication systems, the transmitter does
not know exactly which packets are lost, but instead has an
estimate of the probability of packet loss. Thus, from the point
of view of the transmitter, the distortion at the receiver is a
random variable. Recent work on resilient video coding for
packet loss networks has primarily focused on minimizing the
expected value of the end-to-end distortion [2,3,4,5,6]. A
common feature among these works is that they all measure
video quality by the expected distortion, where the expectation
is computed with respect to all the possible packet loss patterns.
Several methods have been proposed for calculating the
expected distortion. These methods can be divided into two
general categories. The first is optimal per-pixel estimation
methods, such as [3,4,2], that can be used to accurately calculate
the expected value of the distortion under certain conditions.
(a) (b) (c)
Fig. 1. (a) Expected frame, (b and c) two loss realizations.
The second category consists of methods that use models to
estimate the expected distortion [5,6]. Model based methods are
useful when either computation power is limited or closed form
expressions for the expected distortion are not known. The
above is only a small sample of the work in this area.
At the receiver, the end user sees only one of the many
possible reconstructed sequences, depending on which packets
are lost. Therefore, the actual distortion at the receiver is not
equal to the expected distortion. To illustrate this point, consider
the images shown in Fig. 1. While the expected reconstructed
frame (averaged over all possible loss realizations) may be
reasonable, the quality at the receiver may vary greatly based on
which packets are lost. Therefore, in this paper we argue that the
variance of the end-to-end distortion should also be considered
when characterizing video quality in lossy packet networks. We
introduce the concept of “variance-aware distortion estimation”
(VADE), and present a framework for controlling both the
expected value and the variance of the end-to-end distortion.
2. SYSTEM MODEL
Consider a video communication system where the video is
encoded using a block-based motion-compensated technique
(e.g., H.263, MPEG-4). Each frame is divided into slices that
are comprised of consecutive Macro-Blocks (MBs). Each slice
is independently decodable, i.e., the decoding of one slice is not
affected by the loss of other slices in the same frame. Losses in
other frames may cause temporal error propagation due to inter-
frame prediction. After a slice is encoded, it is transmitted
across a wireless channel as a separate packet. In the following,
slice and packet will be used interchangeably. Let M be the
number of packets in a given frame and k be the packet index.
For each packet, source-coding parameters, such as the
coding mode (intra/inter/skip) and quantization step-size for
each MB are specified. We use μ
k
to denote the source-coding
parameters for the kth packet, and μ
μ
μ = {μ
1
…μ
M
} to denote the
coding parameters for all the packets in a frame. The number of
bits used to encode the kth packet, B
k
, is a function of μ
k
; we use
B
k
(μ
k
) to explicitly indicate this dependency.
In addition to μ
k
, we assume that the transmission power for
each packet, P
k
, can be adjusted. We use P = {P
1
…P
M
} to
denote the transmission power for all the packets in a frame.
0-7803-7750-8/03/$17.00 ©2003 IEEE. ICIP 2003