QUALITY ASSESSMENT FOR H.264 CODED LOW-RATE AND LOW-RESOLUTION VIDEO SEQUENCES Olivia Nemethova, Michal Ries and Markus Rupp Institute for Communications and RF Engineering Vienna University of Technology Gusshausstr. 25/389 Vienna, Austria email: {onemeth, mries, mrupp}@nt.tuwien.ac.at Eduard Siffel Institute for Telecommunications Slovak University of Technology in Bratislava Ilkovicova 3 Bratislava, Slovakia email:siffel@ktl.elf.stuba.sk ABSTRACT This article concentrates on a quality assessment for H.264 coded low-rate and low-resolution video sequences which are in particular of interest for mobile communication. The choice of appropriate setup for tests is discussed. The focus is given on the influence of the sequence character (spatial and temporal information). The data set is compared to various known video quality metrics. It is shown on the obtained data set, that it is not possible to separate the dy- namic and static parameters without considering the char- acter of the sequence and thus to create a universal metric. KEY WORDS H.264, subjective perceptual video quality, spacial infor- mation, temporal information. 1 Introduction The deployment of packet-oriented wireless networks of- fers new mobile multimedia applications like MMS, video streaming and video-conferencing. Such applications in- troduce several new challenges as they are delay-sensitive and use relatively high bandwidth. Typical end-terminals for such services are the mobile phones, using low QCIF (144 × 176 pixel) image resolution. Due to the band- width limitations of wireless transmissions, it is necessary to compress the video stream before the transmission by means of using the lossy compression algorithms or/and frame rate reduction, introducing particular quality degra- dation, that can be observed as a distortion to the temporal continuity or static picture quality. Several tradeoffs are re- quired between the quality and the amount of the resources needed for the various video application. While many researchers[1, 2] focus on relative simple but objective measures like the Peak-to-Signal to Noise Ratio (PSNR), newer results decide which degradation is (still) acceptable for the user by assessing and estimating his sub- jective perceptual quality evaluation, given by a so-called mean opinion score (MOS). MOS is metric well-known from the subjective perceptual quality evaluation of audio sequences. To obtain such MOS for the one-directional transmission of video sequences, several human observer test methods are described in [3]. Performing a subjective video quality survey requires much effort making it impossible to perform it anytime and any- where. Therefore, there are several metric proposals (e.g. [4, 5]) how to extract MOS values from the video sequence parameters set at the sender or calculated using the model of human visual perception after the reception at the re- ceiver. However, subjective quality evaluation is a psycho-visual experiment and thus the results strongly depend on the type and character of the sequence itself. The intention of this paper is to demonstrate the dependency of the MOS on the sequence character by means of a survey, and to compare obtained results with known metrics. In Section 2 the sequences selected for evaluation are described as well as the setup of the survey which we performed to obtain MOS values. In Section 3 some known metrics for video quality are applied to our set of data and evaluated. The results are further interpreted in Section 4. Focus is given on the video sequence characteristics. Section 5 contains the conclusions and some final remarks. 2 VIDEO QUALITY SURVEY For the tests we selected four video sequences each of ten- second duration with QCIF resolution. Two of them (akiyo, foreman) are well-known professional test sequences ob- tained by a static camera. In the akiyo sequence a fe- male moderator is reading news only by moving her lips and eyes. The foreman sequence contains a monologue of a man moving his head dynamically and at the end of the sequence there is a contiguous scene change. Soccer and panorama are both sequences with permanent cam- era movement. Soccer is a professional sequence; the entire picture is moving - the players and ball in a fast way, the background rather slowly. Panorama is a non- professional sequence, containing uniform but smooth and relatively slow movements of the scene. Snapshots of these sequences are depicted in Figure 1. We used all possible nine combinations of bit rates 128kbps, 64kbps, 32kbps and frame rates 15fps, 10fps, Copyright 2004 IASTED. Published in the proceedings of CIIT, St. Thomas, US Virgin Islands, November 22-24, 2004