Cross-Layer Detection of Visual Impairments
in H.264/AVC Video Sequences
streamed over UMTS Networks
Luca Superiori, Olivia Nemethova, Wolfgang Karner, Markus Rupp
Institute of Communications and Radio-Frequency Engineering
Vienna University of Technology, Austria
Gusshausstrasse 25/389, A-1040 Vienna, Austria
Email: {lsuper, onemeth, wkarner, mrupp}@nt.tuwien.ac.at
Abstract—Incorrectly received packets in low-rate video se-
quences result in the loss of considerably large picture areas
that have to be concealed. The performance of error concealment
decreases with the size of the interpolated picture area. The incor-
rectly received packets may still contain some correct information
that can be exploited at the decoder. In this work we propose the
utilization of information from the link layer of UMTS (Universal
Mobile Telecommunications System) at the application layer for
better pre-localization of errors in the bitstream domain. Syntax
check together with the detection of impairments in the pixel
domain decide which part of the incorrectly received packets will
be concealed. We evaluate the results using error traces from the
live UMTS network and H.264/AVC (Advanced Video Coding)
encoded video stream. Apart from reduced complexity facilitated
by the cross-layer approach, the proposed method gains 1.09 dBs
of Y-PSNR compared to a slice rejection mechanism.
Index Terms—Video streaming, UMTS, H.264/AVC, error
resilience, error detection, cross-layer.
I. I NTRODUCTION
H.264/AVC (Advanced Video Coding) [1], [2] is currently
the newest and best performing video codec. It has been
developed by the Joint Video Team (JVT) of ITU (Interna-
tional Telecommunication Union) and MPEG (Moving Picture
Expert Group). The codec defines several profiles covering a
wide range of applications. In the following we will refer to
its applications for video transmission over UMTS. The 3GPP
(3rd Generation Partnership Projects) specification [3] requires
the mobile terminal to support H.264/AVC in its baseline
profile.
H.264/AVC is a hybrid block-based video codec. Each
frame is subdivided into macroblocks of 16 × 16 pixel. A
macroblock is encoded by means of spatial and temporal
prediction and entropy coding. The encoded video data is
segmented in NALUs (Network Abstraction Layer Units), each
containing a video slice.
For streaming and real time transmission, each NALU is
further encapsulated into one RTP (Real Time Protocol)/ UDP
(Universal Datagram Protocol)/ IP (Internet Protocol) packet.
This protocol stack is recommended in [4] for continuos
media playback at the receiver side. The UDP is an unreliable
protocol that does not allow retransmissions. It contains 16 bits
checksum for error detection. In case a UDP packet fails the
checksum test, it is usually discarded [4], [5]. In the considered
low resolution (usually CIF (352×288 pixels) or even QCIF
(176×144)), and low bit-rate (64-384 kbps) scenario, this
could result in losing a considerable part of the frame.
The encoded video data preceding the error occurrence in
the damaged packet is still valid and can be used to reconstruct
correctly a part of the encoded frame. In [6] the authors
proposed a smart decoder able to detect syntax errors at
macroblock level. The packets that failed the UDP checksum,
are decoded until a syntax error arises. Only the following
macroblocks are concealed. The method showed significant
improvement compared to the classical slice rejection mecha-
nism. It still suffers from a limited detection capability and a
distance between the error occurrence and the error detection.
In order to enhance the performance of the syntax anal-
ysis, in [8] a visual impairments detection mechanism was
proposed. The characteristics of the visual artifacts remaining
after syntax analysis were examined and, by means of local
image statistics analysis, both the detection distance and the
detection probability were improved. The method, however,
requires the visual analysis of the whole decoded NALU. In
the considered scenario a NALU can contain a whole frame.
The smallest unit in which an error can be detected without
additional mechanisms (assuming UMTS as the underlying
system), is the RLC Packet Data Unit (PDU). The information
from the RLC (Radio Link Control) layer can be exploited
by the decoder to reduce the visual artifacts search area. In
this article we present a cross-layer mechanism capable of
detecting visual impairments in the decoded frame. We limit
the search region profiting of the information coming from the
lower layers.
The paper is organized as follows. In Section II, the
visual detection mechanism is briefly described. Section III
presents the proposed cross-layer mechanism. The simulation
scenario is described in Section IV. An evaluation of the
results obtained is performed in Section V. Final remarks and
conclusions are provided in Section VI.
1-4244-1284-6/07/$25.00 ©2007 IEEE
1st International Workshop on Cross Layer Design (IWCLD 2007), Sept. 20-21, 2007 Jinan, China