A Generalized Linear Model for MPEG-2 Packet-Loss Visibility Sandeep Kanumuri Pamela C. Cosman Amy R. Reibman Univ. Calif. at San Diego Univ. Calif. at San Diego AT&T Labs – Research skanumur@code.ucsd.edu pcosman@code.ucsd.edu amy@research.att.com Abstract—In this paper, we focus on predicting the visibility of packet losses in MPEG-2 compressed video streams. We develop a generalized linear model (GLM) to predict the probability that a packet loss will be visible to an average viewer. The GLM input consists of parameters that can be easily extracted from the video near the location of the loss, and outputs an estimate of the probability that that loss is visible. We also show how our GLM can be used to classify each loss as visible or invisible. Using this method, we are able to achieve a high classification accuracy. I. I NTRODUCTION When sending compressed video across today’s communi- cation networks, packet losses may occur. Network service providers would like to (a) provision their network to keep the packet loss rate below an acceptable level, and (b) monitor the traffic on their network to assure continued acceptable video quality. Unfortunately, each packet loss in video has a different visual impact. For example, one may last for a single frame while another may last for many; one may occur in the midst of an active scene while another is in a motionless area. Thus, the problem of evaluating video quality given packet losses is challenging. In this paper, we focus on predicting the visibility of packet losses in MPEG-2 compressed video streams. Our goal is to develop a quality monitor that is accurate, real-time, can operate on every stream in the network, and answers the question, “How are the losses present in this particular stream impacting its visual quality?”. Toward this goal, we develop a generalized linear model (GLM) to predict the probability that a packet loss will be visible to an average viewer. The GLM input consists of parameters that can be easily extracted from the video near the location of the loss, and outputs an estimate of the probability that that loss is visible. We show how our GLM can be used to classify each loss as visible or invisible. A lot of research has been done on developing objective per- ceptual metrics for compressed video not affected by network losses. While these metrics can predict the quality degradation caused due to compression artifacts, they are not equipped to handle the degradation caused by network losses. In earlier efforts to understand the visual impact of packet losses [3], [4], [5], [6], the goal was to understand the average quality of typical videos subjected to average packet loss rates (PLR). Video conferencing is studied in [3] using the average judgement of consumer observers to examine the relative importance of bandwidth, latency, and packet loss. The impact of packet loss on the Mean Opinion Score (MOS) of real-time streaming media was studied in [4] for Microsoft Windows Media encoder 9 (beta version) video. A neural network was trained in [5] to viewer responses on the ITU-R 9-point quality scale, when a single 10-second sequence was subjected to different bandwidth, frame-rate, packet loss rate, and I-block refresh rate. Hughes et al. [6] use MOS to evaluate the subjective quality of VBR video subjected to ATM cell loss over a 10-second period. They show that performance is sensitive not only to the magnitude of the bursts, but also to their frequency. “Very different” results were obtained for different sequences. Other challenges identified by these authors [6] were: (a) many different realizations of both packet loss and video content are necessary to reduce the variability of viewer responses; (b) very low PLRs are difficult to explore because the typical test period (10 seconds) is so short that typical realizations may have no packet losses; (c) the “forgiveness effect” causes viewers to rank a long video based on more recently viewed information. The joint impact of encoding rate and ATM cell losses on MPEG-2 video quality was studied in [13]. Here the quality of video is judged based on an existing perceptual quality metric and not based on subjective tests. A framework for employing objective perceptual quality assessment methods, evaluating the quality of audio, video and multimedia signals, to model network performance is demonstrated in [15]. In addition, these studies [3], [4], [5], [6] all use MOS to evaluate quality. However, the MOS quality rating methodol- ogy has a number of difficulties, as detailed in [7]. First, the impairment (or quality) scales are generally not interpreted by subjects as having equal step-size, and labels in different languages are interpreted differently. Second, subjects tend to avoid the end-points of the scales. Third, the term “quality” itself is actually not a single variable, but has many dimen- sions. Thus, we designed and conducted a subjective test that does not use MOS, and explores the impact of each packet loss individually. Viewers are shown MPEG-2 video with injected packet losses, and asked to indicate when they see an artifact in the displayed video. Data is gathered for a total of 1080 packet losses over 72 minutes of MPEG-2 video. “Ground truth” for the probability of visibility of packet losses is defined by the results of our subjective tests. The frequency of visible 1