Cross-Layer Detection of Visual Impairments in H.264/AVC Video Sequences streamed over UMTS Networks Luca Superiori, Olivia Nemethova, Wolfgang Karner, Markus Rupp Institute of Communications and Radio-Frequency Engineering Vienna University of Technology, Austria Gusshausstrasse 25/389, A-1040 Vienna, Austria Email: {lsuper, onemeth, wkarner, mrupp}@nt.tuwien.ac.at Abstract—Incorrectly received packets in low-rate video se- quences result in the loss of considerably large picture areas that have to be concealed. The performance of error concealment decreases with the size of the interpolated picture area. The incor- rectly received packets may still contain some correct information that can be exploited at the decoder. In this work we propose the utilization of information from the link layer of UMTS (Universal Mobile Telecommunications System) at the application layer for better pre-localization of errors in the bitstream domain. Syntax check together with the detection of impairments in the pixel domain decide which part of the incorrectly received packets will be concealed. We evaluate the results using error traces from the live UMTS network and H.264/AVC (Advanced Video Coding) encoded video stream. Apart from reduced complexity facilitated by the cross-layer approach, the proposed method gains 1.09 dBs of Y-PSNR compared to a slice rejection mechanism. Index Terms—Video streaming, UMTS, H.264/AVC, error resilience, error detection, cross-layer. I. I NTRODUCTION H.264/AVC (Advanced Video Coding) [1], [2] is currently the newest and best performing video codec. It has been developed by the Joint Video Team (JVT) of ITU (Interna- tional Telecommunication Union) and MPEG (Moving Picture Expert Group). The codec deﬁnes several proﬁles covering a wide range of applications. In the following we will refer to its applications for video transmission over UMTS. The 3GPP (3rd Generation Partnership Projects) speciﬁcation [3] requires the mobile terminal to support H.264/AVC in its baseline proﬁle. H.264/AVC is a hybrid block-based video codec. Each frame is subdivided into macroblocks of 16 × 16 pixel. A macroblock is encoded by means of spatial and temporal prediction and entropy coding. The encoded video data is segmented in NALUs (Network Abstraction Layer Units), each containing a video slice. For streaming and real time transmission, each NALU is further encapsulated into one RTP (Real Time Protocol)/ UDP (Universal Datagram Protocol)/ IP (Internet Protocol) packet. This protocol stack is recommended in [4] for continuos media playback at the receiver side. The UDP is an unreliable protocol that does not allow retransmissions. It contains 16 bits checksum for error detection. In case a UDP packet fails the checksum test, it is usually discarded [4], [5]. In the considered low resolution (usually CIF (352×288 pixels) or even QCIF (176×144)), and low bit-rate (64-384 kbps) scenario, this could result in losing a considerable part of the frame. The encoded video data preceding the error occurrence in the damaged packet is still valid and can be used to reconstruct correctly a part of the encoded frame. In [6] the authors proposed a smart decoder able to detect syntax errors at macroblock level. The packets that failed the UDP checksum, are decoded until a syntax error arises. Only the following macroblocks are concealed. The method showed signiﬁcant improvement compared to the classical slice rejection mecha- nism. It still suffers from a limited detection capability and a distance between the error occurrence and the error detection. In order to enhance the performance of the syntax anal- ysis, in [8] a visual impairments detection mechanism was proposed. The characteristics of the visual artifacts remaining after syntax analysis were examined and, by means of local image statistics analysis, both the detection distance and the detection probability were improved. The method, however, requires the visual analysis of the whole decoded NALU. In the considered scenario a NALU can contain a whole frame. The smallest unit in which an error can be detected without additional mechanisms (assuming UMTS as the underlying system), is the RLC Packet Data Unit (PDU). The information from the RLC (Radio Link Control) layer can be exploited by the decoder to reduce the visual artifacts search area. In this article we present a cross-layer mechanism capable of detecting visual impairments in the decoded frame. We limit the search region proﬁting of the information coming from the lower layers. The paper is organized as follows. In Section II, the visual detection mechanism is brieﬂy described. Section III presents the proposed cross-layer mechanism. The simulation scenario is described in Section IV. An evaluation of the results obtained is performed in Section V. Final remarks and conclusions are provided in Section VI. 1-4244-1284-6/07/$25.00 ©2007 IEEE 1st International Workshop on Cross Layer Design (IWCLD 2007), Sept. 20-21, 2007 Jinan, China