IMPACT OF JITTER AND JERKINESS ON PERCEIVED VIDEO QUALITY Quan Huynh-Thu a,b and Mohammed Ghanbari b a Psytechnics Ltd, 23 Museum Street, Ipswich, IP1 1HN, UK b University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK ABSTRACT In the transmission of digitally compressed video, a very important source of impairments comes from the delivery of the video stream over an error-prone channel. Partial loss or partial corruption of information can have a dramatic im- pact on user’s perceived quality because a localized distor- tion within a frame can spatially and temporally propagate over frames. The visual impact of such losses varies be- tween video decoders depending on their ability to deal with corrupted streams. Some decoders will choose to entirely discard the frame that has corrupted or missing information and repeat the previous video frame instead, until the next valid decoded frame is available. Encoders can also drop frames during a sudden increase of motion in the content because the target encoding bit rate is too low. In this pa- per, we investigate the perceptual impact of repeated and dropped video frames on perceived quality. 1. INTRODUCTION An objective video quality assessment algorithm is a com- putational method that analyzes a video signal in order to predict its quality. A visual quality metric has many ap- plications such as network monitoring, codec performance benchmarking and optimization. However, a quality met- ric is only useful to the industry if it has a high correspon- dence with subjective ratings. In recent years, important progress has been made in the development of perceptual quality metrics. Most of research on visual quality assessment addresses issues related to the impact of spatial distortions or noise. A few recent studies can be found in [1, 2, 3, 4, 5, 6, 7, 8, 9]. Even in the case of motion video, most research still devotes effort in studying the impact of spatial dis- tortions within the individual frames of a sequence. The International Telecommunication Union has recently pro- duced two Recommendations for the objective perceptual picture quality measurement of television transmission [10, The main author is also currently a PhD student at the University of Essex. 11]. However, these are also limited to the measurement of quality due to coding distortions. A very important source of video impairments comes from the transmission of the video stream over an error-prone channel. Digitally com- pressed video is mostly transferred over a packet-switched network. In this scenario, two main types of transmission impairments can typically occur. Packets can be lost or they can be delayed to the point where they are not received in time for decoding. Both will result in the same effect on the decoded information: a portion of the video stream is miss- ing. This partial loss of information can have a dramatic impact on user’s perceived quality since the loss of a single packet can result in a corrupted macroblock. Corrupted in- formation can subsequently spread both spatially to neigh- boring blocks and temporally over adjacent frames because most video encoders use differential predictive coding and motion compensation. The loss or corruption of a single macroblock can therefore affect the stream up to the next resynchronization point (e.g. next slice, next intra-coded frame). The visual impact of such losses varies between video decoders depending on their ability to deal with cor- rupted streams. Some decoders hardly recover from certain errors, whilst others will apply more or less complex error concealment mechanisms. For example, the authors in [12] studied the effect of block-edge impairment and packet loss in video streaming when error concealment is used by the video client and when this creates spatial degradations in video frames affected by packet loss. However, in some ap- plications decoders will choose to entirely discard the frame that has corrupted or missing information and repeat the previous video frame instead, until the next valid decoded frame is available. This is an entirely different situation to error concealment scenarios since one or several com- plete video frames are missing. No additional spatial degra- dations are effectively introduced but frame repetition and frame drop occur. This is referred as video jitter. Jitter can also occur even in the absence of transmission errors. At the decoder side, video frames can be dropped by a play- back system that is not efficient enough to decode and dis- play each video frame at the required speed. At the encoder side, frames may be dropped because of a sudden increase