IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 18, NO. 6, JUNE 2000 977 An End-to-End Approach for Optimal Mode Selection in Internet Video Communication: Theory and Application Dapeng Wu, Student Member, IEEE, Yiwei Thomas Hou, Member, IEEE, Bo Li, Member, IEEE, Wenwu Zhu, Member, IEEE, Ya-Qin Zhang, Fellow, IEEE, and H. Jonathan Chao, Senior Member, IEEE Abstract—Rate-distortion (R-D) optimized mode selec- tion is a fundamental problem for video communication over packet-switched networks. The classical R-D optimized mode selection only considers quantization distortion at the source. Such an approach is unable to achieve global optimality under the error-prone environment since it does not consider the packetiza- tion behavior at the source, the transport path characteristics, and receiver behavior. This paper presents an end-to-end approach to generalize the classical theory of R-D optimized mode selection for point-to-point video communication. We introduce a notion of global distortion by taking into consideration both the path characteristics (i.e., packet loss) and the receiver behavior (i.e., the error concealment scheme), in addition to the source behavior (i.e., quantization distortion and packetization). We derive, for the first time, a set of accurate global distortion metrics for any packetization scheme. Equipped with the global distortion metrics, we design an R-D optimized mode selection algorithm to provide the best tradeoff between compression efficiency and error resilience. The theory developed in this paper is general and is applicable to many video coding standards, including H.261/263 and MPEG-1/2/4. As an application, we integrate our theory with point-to-point MPEG-4 video conferencing over the Internet, where a feedback mechanism is employed to convey the path characteristics (estimated at the receiver) and receiver behavior (error concealment scheme) to the source. Simulation results conclusively demonstrate that our end-to-end approach offers superior performance over the classical approach for Internet video conferencing. Index Terms—Error concealment, feedback, global distortion metric, Internet, MPEG-4, packetization, R-D optimized mode selection, video conferencing. I. INTRODUCTION V IDEO communication over the Internet is becoming an important application in recent years. A challenging problem associated with Internet video communication lies in how to cope with packet loss in the network and achieve acceptable video quality at the receiver. This is because packet Manuscript received May 15, 1999; revised November 1, 1999. D. Wu is with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 USA. Y. T. Hou is with Fujitsu Laboratories of America, Sunnyvale, CA 94086 USA. B. Li is with the Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong. W. Zhu and Y.-Q. Zhang are with Microsoft Research, China, 5F, Beijing Sigma Center, Zhichun Road Haidian District, Beijing 100080, China. H. J. Chao is with the Department of Electrical Engineering, Polytechnic Uni- versity, Six Metrotech Center, Brooklyn, NY 11201 USA. Publisher Item Identifier S 0733-8716(00)04341-9. loss is unavoidable in the Internet and may have significant impact on perceptual quality. The effect of lost packets on the video presentation quality depends on the coding scheme used at the source, the network congestion status, and the error concealment scheme used at the receiver. High-compression coding algorithms usually employ inter-coding (i.e., prediction) to achieve efficiency. With these coding algorithms, loss of a packet may degrade video quality over a large number of frames, until the next intra-coded frame is received. Intra-coding can effectively stop error propagation at the cost of efficiency while inter-coding can achieve com- pression efficiency at the risk of error propagation. Therefore, a good mode selection between intra-mode and inter-mode should be in place to enhance the robustness of the video communica- tions using intra- and inter-coding. For video communication over a network, a coding algorithm such as H.263 or MPEG-4 [6] usually employs rate control to match the output rate to the available bandwidth. The objective of rate-controlled compression algorithms is to maximize the video quality under the constraint of a given bit budget. This can be achieved by choosing a mode that minimizes the quan- tization distortion between the original frame/macroblock and the reconstructed one under a given bit budget [9], [15], which is the so-called rate-distortion (R-D) optimized mode selection. We refer such R-D optimized mode selection as the classical ap- proach. The classical approach is not able to achieve global opti- mality under the error-prone environment since it does not con- sider the network congestion status and the receiver behavior. This paper presents an end-to-end approach to solve the fundamental problem of R-D optimized mode selection for peer-to-peer video communication over packet-switched net- works. Under the end-to-end approach, we identify three factors that have an impact on the video presentation quality at the receiver, namely, the source behavior, the path characteristics, and the receiver behavior. To put such an end-to-end approach into a theoretical framework, we develop a theory for globally optimal mode selection under packet lossy environment. We begin with formulating the problem of globally optimal mode selection using the notion of global distortion metric. Then we describe the three factors in the end-to-end approach. We de- rive, for the first time, a set of accurate global distortion metrics for any packetization scheme. We show how to apply the global distortion metrics to specific packetization scheme. Equipped with the global distortion metrics, we design an R-D optimized mode selection algorithm to provide the best tradeoff between 0733–8716/00$10.00 © 2000 IEEE