IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 18, NO. 6, JUNE 2000 977
An End-to-End Approach for Optimal Mode
Selection in Internet Video Communication: Theory
and Application
Dapeng Wu, Student Member, IEEE, Yiwei Thomas Hou, Member, IEEE, Bo Li, Member, IEEE,
Wenwu Zhu, Member, IEEE, Ya-Qin Zhang, Fellow, IEEE, and H. Jonathan Chao, Senior Member, IEEE
Abstract—Rate-distortion (R-D) optimized mode selec-
tion is a fundamental problem for video communication over
packet-switched networks. The classical R-D optimized mode
selection only considers quantization distortion at the source.
Such an approach is unable to achieve global optimality under the
error-prone environment since it does not consider the packetiza-
tion behavior at the source, the transport path characteristics, and
receiver behavior. This paper presents an end-to-end approach to
generalize the classical theory of R-D optimized mode selection
for point-to-point video communication. We introduce a notion
of global distortion by taking into consideration both the path
characteristics (i.e., packet loss) and the receiver behavior (i.e.,
the error concealment scheme), in addition to the source behavior
(i.e., quantization distortion and packetization). We derive, for
the first time, a set of accurate global distortion metrics for
any packetization scheme. Equipped with the global distortion
metrics, we design an R-D optimized mode selection algorithm
to provide the best tradeoff between compression efficiency and
error resilience. The theory developed in this paper is general and
is applicable to many video coding standards, including H.261/263
and MPEG-1/2/4. As an application, we integrate our theory with
point-to-point MPEG-4 video conferencing over the Internet,
where a feedback mechanism is employed to convey the path
characteristics (estimated at the receiver) and receiver behavior
(error concealment scheme) to the source. Simulation results
conclusively demonstrate that our end-to-end approach offers
superior performance over the classical approach for Internet
video conferencing.
Index Terms—Error concealment, feedback, global distortion
metric, Internet, MPEG-4, packetization, R-D optimized mode
selection, video conferencing.
I. INTRODUCTION
V
IDEO communication over the Internet is becoming
an important application in recent years. A challenging
problem associated with Internet video communication lies
in how to cope with packet loss in the network and achieve
acceptable video quality at the receiver. This is because packet
Manuscript received May 15, 1999; revised November 1, 1999.
D. Wu is with the Department of Electrical and Computer Engineering,
Carnegie Mellon University, Pittsburgh, PA 15213 USA.
Y. T. Hou is with Fujitsu Laboratories of America, Sunnyvale, CA 94086
USA.
B. Li is with the Department of Computer Science, Hong Kong University of
Science and Technology, Clear Water Bay, Kowloon, Hong Kong.
W. Zhu and Y.-Q. Zhang are with Microsoft Research, China, 5F, Beijing
Sigma Center, Zhichun Road Haidian District, Beijing 100080, China.
H. J. Chao is with the Department of Electrical Engineering, Polytechnic Uni-
versity, Six Metrotech Center, Brooklyn, NY 11201 USA.
Publisher Item Identifier S 0733-8716(00)04341-9.
loss is unavoidable in the Internet and may have significant
impact on perceptual quality.
The effect of lost packets on the video presentation quality
depends on the coding scheme used at the source, the network
congestion status, and the error concealment scheme used at the
receiver. High-compression coding algorithms usually employ
inter-coding (i.e., prediction) to achieve efficiency. With these
coding algorithms, loss of a packet may degrade video quality
over a large number of frames, until the next intra-coded frame
is received. Intra-coding can effectively stop error propagation
at the cost of efficiency while inter-coding can achieve com-
pression efficiency at the risk of error propagation. Therefore, a
good mode selection between intra-mode and inter-mode should
be in place to enhance the robustness of the video communica-
tions using intra- and inter-coding.
For video communication over a network, a coding algorithm
such as H.263 or MPEG-4 [6] usually employs rate control to
match the output rate to the available bandwidth. The objective
of rate-controlled compression algorithms is to maximize the
video quality under the constraint of a given bit budget. This
can be achieved by choosing a mode that minimizes the quan-
tization distortion between the original frame/macroblock and
the reconstructed one under a given bit budget [9], [15], which
is the so-called rate-distortion (R-D) optimized mode selection.
We refer such R-D optimized mode selection as the classical ap-
proach. The classical approach is not able to achieve global opti-
mality under the error-prone environment since it does not con-
sider the network congestion status and the receiver behavior.
This paper presents an end-to-end approach to solve the
fundamental problem of R-D optimized mode selection for
peer-to-peer video communication over packet-switched net-
works. Under the end-to-end approach, we identify three factors
that have an impact on the video presentation quality at the
receiver, namely, the source behavior, the path characteristics,
and the receiver behavior. To put such an end-to-end approach
into a theoretical framework, we develop a theory for globally
optimal mode selection under packet lossy environment. We
begin with formulating the problem of globally optimal mode
selection using the notion of global distortion metric. Then we
describe the three factors in the end-to-end approach. We de-
rive, for the first time, a set of accurate global distortion metrics
for any packetization scheme. We show how to apply the global
distortion metrics to specific packetization scheme. Equipped
with the global distortion metrics, we design an R-D optimized
mode selection algorithm to provide the best tradeoff between
0733–8716/00$10.00 © 2000 IEEE