186 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 2, FEBRUARY 2008
Fast Inter-Mode Selection in the H.264/AVC Standard
Using a Hierarchical Decision Process
Andy Chia Woo Yu, Student Member, IEEE, Graham R. Martin, Member, IEEE, and
Heechan Park, Student Member, IEEE
Abstract—A complexity reduction algorithm tailored for the
H.264/AVC encoder is described. It aims to alleviate the computa-
tional burden imposed by Lagrangian rate distortion optimization
in the inter-mode selection process. The proposed algorithm is
described as a hierarchical structure comprising three levels. Each
level targets different types of macroblocks according to the com-
plexity of the search process. Early termination of mode selection
is triggered at any of the levels to avoid a full cycle of Lagrangian
examination. The algorithm is evaluated using a wide range of
test sequences of different classes. The results demonstrate a
reduction in encoding time of at least 40%, regardless of the class
of sequence. Despite the reduction in computational complexity,
picture quality is maintained at all bit rates.
Index Terms—Fast algorithm, H.264/AVC standard, inter-
frame, Lagrangian optimization, mode selection.
I. INTRODUCTION
S
INCE the early 90s, developments in digital video coding
standards have played a pivotal role in the commercial
success of multimedia communications. In particular, the
MPEG-x and H.26x video coding standards have provided
interoperability in heterogeneous network systems [2]. Consid-
ering that transmission bandwidth is still a valuable commodity,
ongoing developments in video coding seek to achieve addi-
tional compression while maintaining a reasonable level of
signal-to-noise ratio. The H.264/AVC coding standard, a Joint
Video Team (JVT) collaborative project involving the ITU-T
Video Coding Experts Group (VCEG) and the ISO/IEC Moving
Picture Experts Group (MPEG), is a recent development aimed
at addressing this issue [6], [22]. Simulations using the Joint
Model (JM) reference software (a laboratory implementation of
the standard) indicate that compression efficiency outperforms
that of other standards (e.g., MPEG-4 part 2 and ) by
a factor of two [1], [4]. This is achieved without sacrifice of
picture quality. The standard is of particular appeal to applica-
tions that allow higher processing delays, for instance in media
storage and broadcasting. The compression efficiency achieved
by the H.264/AVC coding technique is attributed to a number
Manuscript received September 4, 2006; revised March 14, 2007. This paper
was recommended by Associate Editor I. Ahmad.
A. C. W. Yu is with the Department of Electrical and Electronic Engineering,
Imperial College London, London SW7 2AZ, U.K. (e-mail: andycyu@imperial.
ac.uk).
G. R. Martin and H. Park are with the Department of Computer Science,
University of Warwick, Coventry CV4 7AL, U.K. (e-mail: grm@dcs.warwick.
ac.uk; heechan@dcs.warwick.ac.uk).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSVT.2007.913970
Fig. 1. Two types of block decompositions for motion estimation: macroblock-
type (top) and subblock-type (bottom).
of advanced coding features, for example, multi-mode selec-
tion, Lagrangian rate-distortion optimization, integer discrete
cosine transform (DCT) and quantization and context-based
entropy coding. Of all the new features introduced, multi-mode
selection for inter-frame coding requires the most computa-
tion, accounting for 60%–80% of the entire encoding time.
Inter-frame mode selection is a combination of block-based
motion estimation and rate-distortion optimization. Blocks are
categorized as macroblocks and subblocks. Macroblocks may
be of size, 16 16, 16 8, and 8 16 pixels. Macroblocks
may be decomposed into subblocks, for example the
type, comprises a number of smaller partition sizes ranging
from 4 4 to 8 8 pixels. Fig. 1 illustrates the partition sizes
available for inter-frame coding.
The selection of block decomposition structures, as well as
other hybrid coding techniques, provides a set of options/modes
for the encoder. The possible prediction modes [6] for inter-
frame coding are
(1)
where SKIP is the mode that directly copies the content of the
macroblock in the same position in the reference frame (de-
noted as the co-located macroblock), without the need for mo-
tion compensation. In contrast, the INTRA mode utilizes neigh-
boring pixels to perform a spatial prediction instead of using
motion estimation. In the H.264/AVC standard, a Lagrangian
rate-distortion optimization process is employed to select the
1051-8215/$25.00 © 2008 IEEE