Perceptual ARQ for H.264 Video Streaming over 3G Wireless Networks Paolo Bucciol ∗ , Enrico Masala † and Juan Carlos De Martin ∗ † Dipartimento di Automatica e Informatica / ∗ IEIIT-CNR — Politecnico di Torino Corso Duca degli Abruzzi, 24 — I-10129, Torino, Italy Email: [paolo.bucciol|masala|demartin]@polito.it Abstract—We present a new ARQ algorithm for video stream- ing over wireless channels. The algorithm takes into account the perceptual and temporal importance of each packet to determine the packet scheduling which maximizes the perceived quality. We propose a simple and flexible function to combine the perceptual importance with the real-time constraints of each packet to determine which is the best packet to transmit at each transmission opportunity. The perceptual importance is evaluated using the analysis-by-synthesis technique. The performance of the proposed algorithm has been analyzed by simulating the transmission of H.264 encoded sequences over a 144 kbit/s UMTS channel. We compared the proposed method with time-driven ARQ techniques, using PSNR as distortion measure. The results show that for the considered channel conditions the proposed method delivers gains up to 2 dB with respect to the time-driven ARQ technique. I. I NTRODUCTION Video streaming is becoming one of the most interesting wireless applications. Delivering good video quality over wire- less channels, however, is difficult because of channel noise and bandwidth limitations. In the case of streaming an end-to- end delay of a few seconds is acceptable, therefore the effect of transmission errors can be mitigated using a playout buffer and Automatic Repeat reQuest (ARQ) techniques to recover lost or corrupted data. To optimize bandwidth usage, most multimedia ARQ tech- niques carefully consider one or both of the main features of multimedia traffic: its being time-sensitive and its highly non- uniform perceptual importance. The Soft ARQ proposal [1], for instance, avoids retransmitting late data that would not be useful at the decoder, thus saving bandwidth. Variants of the Soft ARQ technique have been developed for layered coding [1]. Other techniques suggest to assign different priorities to the syntax elements of the compressed multimedia bitstream. In [2], video packets are protected by error correcting codes whose amount depends on the kind of frame to which the video packets belong. Channel adaptation is achieved by an additional ARQ scheme that privileges the most important classes of data. Scheduling of video frames according to the priority given by their position inside the Group of Pictures (GOP) in presented in [3]. The technique is further enhanced by assigning different priorities to the various kinds of data (i.e. motion and texture information) contained in each packet. Optimizing the transmission policy for each single packet [4] [5] leads to improvements with respect to techniques based on a priori determination of the average importance of the elements of the compressed bitstream. The low-delay wireless video transmission system presented in [6] includes an ARQ scheme where packets are retransmitted or not depending on whether the distortion caused by their loss is above a given threshold; however, it is not clear how to optimally determine such threshold. Given a way to associate distortion values to each packet, rate-distortion optimization of the transmission policies has also been proposed [7] [8]. Our proposal is to implement an ARQ scheme in which the retransmission policy is driven by the information about the perceptual and the temporal importance of each packet. The best packet to transmit at each transmission opportunity is selected by means of the importance value of each single packet, determined using a simple and flexible formula, that combines perceptual importance and the maximum delay con- straint. Perceptual importance is evaluated using the analysis- by-synthesis technique described in Section III. In this work we design a new ARQ protocol for wireless transmission and we simulate it in its entirety, including the acknowledgement packets, in the specific case of a UMTS channel at 144 kbit/s. We model both forward and backward channel losses through a Gilbert model whose parameters are extracted from wireless channel simulations in different fading conditions. The sequences are encoded using the state-of-the- art H.264 video coding standard [9]. Detailed PSNR results are reported, analyzing the influence of various parameters on the performance. This paper is organized as follows. Section II describes the scenario, including the H.264 setup for the wireless trans- mission. In Section III we present an analysis-by-synthesis approach to evaluate the perceptual importance of the video packets, we analyze the transmission constraints and we design the ARQ algorithm. Section IV shows performance compar- isons with other methods as well as the influence of some key algorithm parameters. Finally, conclusions are drawn in Section V. II. VIDEO STREAMING TRANSMISSION This paper focuses on the streaming of multimedia data from a base station to a mobile device. The aim is to design an ARQ transmission policy for the base station based on feedback information, and at the same time to keep the mobile device algorithms as simple as possible. 0-7803-8533-0/04/$20.00 (c) 2004 IEEE IEEE Communications Society