Using H.264/AVC-based Scalable Video Coding (SVC) for Real Time Streaming in Wireless IP Networks Thomas Schierl, Cornelius Hellge, Shpend Mirta, Karsten Grüneberg, and Thomas Wiegand Fraunhofer Institute for Telecommunications – Heinrich-Hertz-Institut Einsteinufer 37, D-10587 Berlin, Germany [ schierl | hellge | mirta | grueneberg | wiegand ]@hhi.fraunhofer.de Abstract — A streaming system based on the Scalable Video Coding (SVC) extension of H.264/AVC is shown. SVC allows for data rate adaptation without re-encoding just by dropping packets of the bit stream. By that, it enables for instance multicast services to clients of heterogeneous capabilities at the same time, while consuming less bit rate compared to simulcasting the services. Additionally, the robustness of a streaming connection against packet losses can be significantly increased if the different layers of the coded video stream are unequally protected by a forward error correction scheme. In this case, users will experience graceful degradation of image quality rather than visible errors or interruptions. The introduction of new services using SVC can also take advantage of a backward-compatible base layer. Different use cases for SVC streaming in wireless IP networks and selected simulation results of the improved error robustness for a DVB-H wireless transmission are presented. I. INTRODUCTION Video streaming using a conventional (single-layer) codec through a point-to-point (unicast) connection can deal with variable throughput using a feedback-based transmission and data rate adaptation at the server side [1]. In case, the same content is streamed to several clients (via multicast or broadcast), data rate adaptation is not applicable at the server for each individual client. A similar problem exists if the same content is streamed to a number of clients with different capabilities. In that case, simulcast transmission of individual streams according to each capability regarding resolution, frame rate, and bit rate can be a waste of network resources. One basic principle how scalable video coding can enhance a streaming system is a server that multicasts a layered video stream to a variety of clients [2]. The reception of the bit stream’s base layer is always required for at least decoding the base quality. Additionally, several nested enhancement layers exist. A particular layer requires the presence of all lower layers it depends on. In the best case (highest available play-back quality), the scalable bitstream is received completely. If some enhancement layers of the stream are not forwarded to the terminal, the bit rate is reduced at the cost of lower image fidelity (Signal-to-Noise Ratio), lower frame rate, or lower spatial resolution. The next sections introduce the technique of scalable video coding and its implications on the transport protocol layer. Thereafter, different application scenarios are explained. Finally, an actual application for layered transmission in DVB-H is presented. For this application, selected simulation results are presented showing the impact of unequal error protection in a DVB-H transmission of layered video on its response to packet losses. II. SCALABLE VIDEO CODING The basic SVC design [3], which is an extension to the H.264/AVC video coding standard [4], can be classified as layered video codec. Coder structure and coding efficiency depend on the scalability features required by an application. Figure 1 shows a typical coder structure with two spatial layers each of which contains a fidelity enhancement layer. Hierarchical MCP & Intra prediction Base layer coding texture motion Scalable bit-s tream H.264/AVC MCP & Intra prediction Base layer coding texture motion H.264/AVC conforming encoder H.264/AVC- conforming base layer bitstream Inter-layer prediction: Intra Motion Residual S patial decimation F idelity enhancement F idelity enhancement Multiplex E nhancement layer bitstreams Hierarchical MCP & Intra prediction Base layer coding texture motion Scalable bit-s tream H.264/AVC MCP & Intra prediction Base layer coding texture motion H.264/AVC conforming encoder H.264/AVC- conforming base layer bitstream Inter-layer prediction: Intra Motion Residual S patial decimation F idelity enhancement F idelity enhancement Multiplex E nhancement layer bitstreams Figure 1: Coder structure example with two spatial layers The hybrid video coding approach is extended in a way that a wide range of spatio-temporal and fidelity scalability is achieved. An SVC bitstream consists of a base layer and one or several enhancement layers. The removal of enhancement layers leads to a decoded video signal at reduced temporal (frame rate), SNR (picture fidelity), or spatial (picture resolution) quality. The base-layer is a plain H.264/AVC bitstream ensuring backward-compatibility to existing receivers. SVC’s temporal scaling functionality is often based on a temporal decomposition using hierarchical B-frames. It has Part of the work is funded by the European Commission under contract number FP6-IST-0028097, project ASTRALS.