IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 7,JULY 2001 977 Foveated Video Compression with Optimal Rate Control Sanghoon Lee, Marios S. Pattichis, and Alan Conrad Bovik, Fellow, IEEE Abstract—Recently, foveated video compression algorithms have been proposed which, in certain applications, deliver high-quality video at reduced bit rates by seeking to match the nonuniform sampling of the human retina. We describe such a framework here where foveated video is created by a nonuniform filtering scheme that increases the compressibility of the video stream. We maximize a new foveal visual quality metric, the foveal signal-to-noise ratio (FSNR) to determine the best compression and rate control parameters for a given target bit rate. Specifically, we establish a new optimal rate control algorithm for maximizing the FSNR using a Lagrange multiplier method defined on a curvi- linear coordinate system. For optimal rate control, we also develop a piecewise (rate–distortion)/ (rate–quantization) model. A fast algorithm for searching for an optimal Lagrange multiplier is subsequently presented. For the new models, we show how the reconstructed video quality is affected, where the FPSNR is maximized, and demonstrate the coding performance for H.263,+,++/MPEG-4 video coding. For H.263/MPEG video coding, a suboptimal rate control algorithm is developed for fast, high-performance applications. In the simulations, we compare the reconstructed pictures obtained using optimal rate control methods for foveated and normal video. We show that foveated video coding using the suboptimal rate control algorithm delivers excellent performance under 64 kb/s. Index Terms—Digital video, foveation, image compression, rate control, video compression. I. INTRODUCTION V IDEO standards have always been associated with partic- ular ranges of bit rates. In order to maximize the video compression ratio for a given video standard, it is necessary to use the maximum degree of quantization, typically determined by a quantization parameter (QP) that is provided by the stan- dard. At the maximum compression setting, the compressed bit rate achieves the minimum bound on the number of generated bits, which depends on the codeword density used to represent the discrete cosine transform (DCT) coefficients, i.e., the com- plexity of the input image sequence. By removing unessential spatial frequency information from a video sequence, the spatial Manuscript received November 18, 1998; revised March 28, 2001. This work was supported in part by Bell Labs, Lucent Technologies, Texas Instruments Inc., and by the Texas Advanced Technology Program. The associate editor co- ordinating the review of this manuscript and approving it for publication was Dr. Boon-Lock Yeo. S. Lee is with Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974 USA. M. S. Pattichis is with the Department of Electrical and Computer Engi- neering, University of New Mexico, Albuquerque, NM 87131 USA. A. C. Bovik is with the Center for Vision and Image Sciences, Department of Electrical and Computer Engineering, The University of Texas, Austin, TX 78712-1084 USA (e-mail: bovik@ece.utexas.edu). Publisher Item Identifier S 1057-7149(01)05446-X. redundancy decreases, due primarily to the reduction or elimi- nation of high-frequency DCT coefficients that are deemed to be visually unimportant. Moreover, motion compensation errors also tend to be reduced. Because of such spatial/temporal redun- dancy reduction, the coding efficiency is improved, and the min- imum bound on the compressed bit rate is reduced. For example, suppose that a CIF image sequence is compressed to 50–1000 kb/s for a QP ranging between 31 and 1. If the bit rate is further reduced by 40% by selectively removing some kind of information, then the bit rate range is scaled down to 30–600 kb/s, which is a range of interest for applications where the transmission rate is severely restricted by the channel ca- pacity, as in wireless networks or PSTN. Naturally, reducing the bit rate in this way has the potential to degrade the visual fidelity in some way, depending on the type of information that is removed. The use of other transforms, such as wavelet methods, offers promise, but even in those do- mains the limits of additional compression that can be obtained is probably being probed already, and in any case, does not offer the current advantage of standards-compliance. In this paper, we explore the possibility of increasing compression performance, while maintaining or even improving visual fidelity, while also maintaining standards-compliance. We show how this can be done in an effective way by the selective reduction of high-fre- quency coefficients according to a nonuniform spatial law. The method we will explore is called foveation. The human retina possesses a nonuniform spatial distribution (resolution) of photoreceptors, with highest density on that part of the retina aligned with the visual axis: the fovea. The photore- ceptor density rapidly decreases with distance away (“eccen- tricity”) from the fovea, hence the local visual frequency band- width also falls away. Subjective image quality can be measured, to some degree, as a function of viewing distance, resolution, picture size, and the contrast sensitivity of the human eye [1], [2]. Recently, very sophisticated commercial eye trackers (head-mounted or desktop) have become available that either track an infrared (IR) reflection of the retina, or directly detect and track the pupil image [3]–[5]. Using an eye tracker, the point of visual fixation can be determined in real-time and delivered over an end-to-end visual communication system. Several real-time/nonreal-time visual communication systems associated with eye trackers have already been proposed and demonstrated in the field of visual communications (wireless video phones, video conferencing systems, web-news, web-ad- vertisement, and personal communication systems) as well as virtual reality (virtual space teleconferencing, virtual three-di- mensional games, computer-aided design, remote telepresence, 1057–7149/01$10.00 ©2001 IEEE