IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 7,JULY 2001 977
Foveated Video Compression
with Optimal Rate Control
Sanghoon Lee, Marios S. Pattichis, and Alan Conrad Bovik, Fellow, IEEE
Abstract—Recently, foveated video compression algorithms
have been proposed which, in certain applications, deliver
high-quality video at reduced bit rates by seeking to match the
nonuniform sampling of the human retina. We describe such a
framework here where foveated video is created by a nonuniform
filtering scheme that increases the compressibility of the video
stream. We maximize a new foveal visual quality metric, the foveal
signal-to-noise ratio (FSNR) to determine the best compression
and rate control parameters for a given target bit rate. Specifically,
we establish a new optimal rate control algorithm for maximizing
the FSNR using a Lagrange multiplier method defined on a curvi-
linear coordinate system. For optimal rate control, we also develop
a piecewise – (rate–distortion)/ – (rate–quantization)
model. A fast algorithm for searching for an optimal Lagrange
multiplier is subsequently presented. For the new models, we
show how the reconstructed video quality is affected, where the
FPSNR is maximized, and demonstrate the coding performance
for H.263,+,++/MPEG-4 video coding. For H.263/MPEG video
coding, a suboptimal rate control algorithm is developed for fast,
high-performance applications. In the simulations, we compare
the reconstructed pictures obtained using optimal rate control
methods for foveated and normal video. We show that foveated
video coding using the suboptimal rate control algorithm delivers
excellent performance under 64 kb/s.
Index Terms—Digital video, foveation, image compression, rate
control, video compression.
I. INTRODUCTION
V
IDEO standards have always been associated with partic-
ular ranges of bit rates. In order to maximize the video
compression ratio for a given video standard, it is necessary to
use the maximum degree of quantization, typically determined
by a quantization parameter (QP) that is provided by the stan-
dard. At the maximum compression setting, the compressed bit
rate achieves the minimum bound on the number of generated
bits, which depends on the codeword density used to represent
the discrete cosine transform (DCT) coefficients, i.e., the com-
plexity of the input image sequence. By removing unessential
spatial frequency information from a video sequence, the spatial
Manuscript received November 18, 1998; revised March 28, 2001. This work
was supported in part by Bell Labs, Lucent Technologies, Texas Instruments
Inc., and by the Texas Advanced Technology Program. The associate editor co-
ordinating the review of this manuscript and approving it for publication was
Dr. Boon-Lock Yeo.
S. Lee is with Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974
USA.
M. S. Pattichis is with the Department of Electrical and Computer Engi-
neering, University of New Mexico, Albuquerque, NM 87131 USA.
A. C. Bovik is with the Center for Vision and Image Sciences, Department
of Electrical and Computer Engineering, The University of Texas, Austin, TX
78712-1084 USA (e-mail: bovik@ece.utexas.edu).
Publisher Item Identifier S 1057-7149(01)05446-X.
redundancy decreases, due primarily to the reduction or elimi-
nation of high-frequency DCT coefficients that are deemed to
be visually unimportant. Moreover, motion compensation errors
also tend to be reduced. Because of such spatial/temporal redun-
dancy reduction, the coding efficiency is improved, and the min-
imum bound on the compressed bit rate is reduced. For example,
suppose that a CIF image sequence is compressed
to 50–1000 kb/s for a QP ranging between 31 and 1. If the bit
rate is further reduced by 40% by selectively removing some
kind of information, then the bit rate range is scaled down to
30–600 kb/s, which is a range of interest for applications where
the transmission rate is severely restricted by the channel ca-
pacity, as in wireless networks or PSTN.
Naturally, reducing the bit rate in this way has the potential
to degrade the visual fidelity in some way, depending on the
type of information that is removed. The use of other transforms,
such as wavelet methods, offers promise, but even in those do-
mains the limits of additional compression that can be obtained
is probably being probed already, and in any case, does not offer
the current advantage of standards-compliance. In this paper, we
explore the possibility of increasing compression performance,
while maintaining or even improving visual fidelity, while also
maintaining standards-compliance. We show how this can be
done in an effective way by the selective reduction of high-fre-
quency coefficients according to a nonuniform spatial law. The
method we will explore is called foveation.
The human retina possesses a nonuniform spatial distribution
(resolution) of photoreceptors, with highest density on that part
of the retina aligned with the visual axis: the fovea. The photore-
ceptor density rapidly decreases with distance away (“eccen-
tricity”) from the fovea, hence the local visual frequency band-
width also falls away. Subjective image quality can be measured,
to some degree, as a function of viewing distance, resolution,
picture size, and the contrast sensitivity of the human eye [1],
[2].
Recently, very sophisticated commercial eye trackers
(head-mounted or desktop) have become available that either
track an infrared (IR) reflection of the retina, or directly detect
and track the pupil image [3]–[5]. Using an eye tracker, the
point of visual fixation can be determined in real-time and
delivered over an end-to-end visual communication system.
Several real-time/nonreal-time visual communication systems
associated with eye trackers have already been proposed and
demonstrated in the field of visual communications (wireless
video phones, video conferencing systems, web-news, web-ad-
vertisement, and personal communication systems) as well as
virtual reality (virtual space teleconferencing, virtual three-di-
mensional games, computer-aided design, remote telepresence,
1057–7149/01$10.00 ©2001 IEEE