Header for SPIE use Detection of Distortion in Small Moving Images, Compared to the Predictions of a Spatio-Temporal Model Kjell Brunnström * a , Bo N. Schenkman a , Albert J. Ahumada Jr. b a ACREO AB, Electrum 236, SE-164 40 Kista, Sweden b NASA Ames Research Center, Moffett Field, CA 94035-1000 ABSTRACT The image sequence discrimination model we use models optical blurring and retinal light adaptation. Two parallel channels, sustained and transient, with different masking rules based on contrast gain control, are used. Performance of the model was studied for two tasks representative of a video communication system with versions of monochrome H.263 compressed images § . In the first study, five image sequences constituted pairs of non-compressed and compressed images to be discriminated with a 2-alternative-forced-choice method together with a staircase procedure. The thresholds for each subject were calculated. Analysis of variance showed that the differences between the pictures were significant. The model threshold was close to the average of the subjects for each picture, and the model thus predicted these results quite well. In the second study, the effect of transmission errors on the Internet, i.e. packet losses, was tested with the method of constant stimuli. Both reference and comparison image was distorted. The task of the subjects was to judge whether the presented video quality was worse than the initially seen reference video. Two different quality levels of the compressed sequences were simulated. The differences in the thresholds among the different video scenes were to some extent predicted by the model. Category scales indicate that detection of distorsions and overall quality judgements are based on different psychological processes. Keywords: video, image quality, spatio-temporal, vision model, H263, packet loss, Internet 1 INTRODUCTION The Internet provides a huge infrastructure for connecting people in inexpensive ways over large distances. Services such as telephony and video conferences, are becoming available to the ordinary customer. However, the quality is still poor, especially image § quality for video conferences. This is due to bandwidth limitations and packet-based transmission. Bandwidth limitations will force high levels of compression, and packet-based transmission can reduce control over the packet arrival time. In addition, packets may be lost due to network congestion. Delayed packets can be included or discarded upon arrival, but in either case, they introduce errors at the receiving end. Standards for giving priority to certain packets are under development and this will certainly decrease the delays and losses. However, there will most likely be a cost for using this type of transmission. The customer may then be provided with a quality level that they can afford. One approach to ensuring good or at least satisfactory image quality is to use a visual model to compare reference images of acceptable quality with the transmitted images. In this study we measure the detection by viewers of poorer quality and see whether this detection can be predicted by a visual model. There have been many reports at earlier Electronic Imaging conferences of similar efforts to model the early-vision system and use the model in technical applications. Examples of models aimed at video applications are those presented by Watson et al. (1999) 1 and by Winkler (1999) 2 . The present article describes the success of such a model in predicting the detection of image compression distortion in image sequences. We use the spatio-temporal visual model that was presented earlier by Ahumada et al. (1998) 3 , who evaluating its performance for contrast sensitivity and masking. Another study compared the predictions of the model with human performance of target detection in moving infrared images (Brunnström et al. 1999) 4 . One of our intentions in the present experiments was to test this model for video applications. This image sequence discrimination model has processing stages representing optical blurring and retinal light adaptation. The processing then * Correspondence: Email: kjell.brunnstrom@acreo.se, Tel: +46-8-6327732, Fax: +46-8-7505430, URL: www.acreo.se Email: bo.schenkman@acreo.se Email: al@vision.arc.nasa.gov URL: vision.arc.nasa.gov/~al/ahumada.html § We will use the word ‘image’ to denote image sequence, moving image or video, while non-moving images will be denoted by e.g. ‘still image’ or ‘individual frame’.