IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 3, MARCH 2010 407 Unequal Error Protection for Robust Streaming of Scalable Video Over Packet Lossy Networks Ehsan Maani, Student Member, IEEE, and Aggelos K. Katsaggelos, Fellow, IEEE Abstract —Efficient bit stream adaptation and resilience to packet losses are two critical requirements in scalable video coding for transmission over packet-lossy networks. Various scalable layers have highly distinct importance, measured by their contribution to the overall video quality. This distinction is especially more significant in the scalable H.264/advanced video coding (AVC) video, due to the employed prediction hierarchy and the drift propagation when quality refinements are missing. Therefore, efficient bit stream adaptation and unequal protection of these layers are of special interest in the scalable H.264/AVC video. This paper proposes an algorithm to accurately estimate the overall distortion of decoder reconstructed frames due to enhancement layer truncation, drift/error propagation, and error concealment in the scalable H.264/AVC video. The method recursively computes the total decoder expected distortion at the picture-level for each layer in the prediction hierarchy. This ensures low computational cost since it bypasses highly complex pixel-level motion compensation operations. Simulation results show an accurate distortion estimation at various channel loss rates. The estimate is further integrated into a cross-layer optimization framework for optimized bit extraction and content- aware channel rate allocation. Experimental results demonstrate that precise distortion estimation enables our proposed trans- mission system to achieve a significantly higher average video peak signal-to-noise ratio compared to a conventional content independent system. Index Terms—Channel coding, error correction coding, mul- timedia communication, video coding, video signal processing. I. Introduction M ULTIMEDIA applications involving the transmission of video over communication networks are rapidly increasing in popularity. These applications include but are not limited to multimedia messaging, video telephony, and video conferencing, wireless and wired Internet video stream- ing, and cable and satellite TV broadcasting. In general, the communication networks supporting these applications are characterized by a wide variability in throughput, delay, and packet loss. Furthermore, a variety of receiving devices with different resources and capabilities are commonly connected Manuscript received January 7, 2009; revised June 11, 2009. First version published November 3, 2009; current version published March 5, 2010. This paper was recommended by Associate Editor J. Ridge. E. Maani is with the School of Electrical Engineering and Com- puter Science, Northwestern University, Evanston, IL 60608 USA (e-mail: ehssan@northwestern.edu). A. K. Katsaggelos is with the Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208 USA (e-mail: aggk@eecs.northwestern.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2009.2035846 to a network. Scalable video coding (SVC) is a highly suitable video transmission and storage system designed to deal with the heterogeneity of the modern communication networks. A video bit stream is called scalable when parts of it can be removed in a way that the resulting substream forms a valid bit stream representing the content of the original with lower resolution and/or quality. Nevertheless, traditionally providing scalability has coincided with significant coding efficiency loss and decoder complexity increase. Primarily due to this reason, the scalable profile of most prior international coding standards such as H.262 MPEG-2 Video, H.263, and MPEG-4 Visual has been rarely used. Designed by taking into account the experience with the past scalable coding tools, the newly developed Scalable Extension of the H.264/advanced video coding (AVC) [1] provides a superb coding efficiency, high bitrate adaptability, and low decoder complexity. The new SVC standard was approved as Amendment 3 of the AVC standard, with full compatibility of the base layer information so that it can be decoded by existing AVC decoders. The design of the SVC allows for spatial, temporal, and quality scalabilities. The video bit stream generated by the SVC is commonly structured in layers, consisting of a base layer (BL) and one or more enhancement layers (ELs). Each enhancement layer either improves the resolution (spatially or temporally) or the quality of the video sequence. Each layer representing a specific spatial or temporal resolution is identified with a dependence identifier D or temporal identifier T . Moreover, quality refinement layers inside each dependence layer are identified by a quality identifier Q. In some extreme cases, dependence layers may have the same spatial resolution resulting in coarse-grain quality scalability. A detailed description of the SVC can be found in [2]. In this paper, the term SVC is used interchangeably for both the concept of scalable coding in general and for the particular design of the scalable extension of the H.264/AVC standard. Most modern communications channels (e.g., the Internet or wireless channels) exhibit wide fluctuations in throughput and packet loss rates. Bit stream adaptation in such environ- ments is critical in determining the video quality perceived by the end user. Bit stream adaptation in SVC is attained by deliberately discarding a number of network abstraction layer (NAL) units at the transmitter or in the network before reaching the decoder such that a particular average bit rate and/or resolution is reached. In addition to bit rate adaptation, NAL units may be lost in the channel (due to, for example, excessive delay or buffer overflow) or arrive erroneous at the 1051-8215/$26.00 c 2010 IEEE