186 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 2, FEBRUARY 2008 Fast Inter-Mode Selection in the H.264/AVC Standard Using a Hierarchical Decision Process Andy Chia Woo Yu, Student Member, IEEE, Graham R. Martin, Member, IEEE, and Heechan Park, Student Member, IEEE Abstract—A complexity reduction algorithm tailored for the H.264/AVC encoder is described. It aims to alleviate the computa- tional burden imposed by Lagrangian rate distortion optimization in the inter-mode selection process. The proposed algorithm is described as a hierarchical structure comprising three levels. Each level targets different types of macroblocks according to the com- plexity of the search process. Early termination of mode selection is triggered at any of the levels to avoid a full cycle of Lagrangian examination. The algorithm is evaluated using a wide range of test sequences of different classes. The results demonstrate a reduction in encoding time of at least 40%, regardless of the class of sequence. Despite the reduction in computational complexity, picture quality is maintained at all bit rates. Index Terms—Fast algorithm, H.264/AVC standard, inter- frame, Lagrangian optimization, mode selection. I. INTRODUCTION S INCE the early 90s, developments in digital video coding standards have played a pivotal role in the commercial success of multimedia communications. In particular, the MPEG-x and H.26x video coding standards have provided interoperability in heterogeneous network systems [2]. Consid- ering that transmission bandwidth is still a valuable commodity, ongoing developments in video coding seek to achieve addi- tional compression while maintaining a reasonable level of signal-to-noise ratio. The H.264/AVC coding standard, a Joint Video Team (JVT) collaborative project involving the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG), is a recent development aimed at addressing this issue [6], [22]. Simulations using the Joint Model (JM) reference software (a laboratory implementation of the standard) indicate that compression efficiency outperforms that of other standards (e.g., MPEG-4 part 2 and ) by a factor of two [1], [4]. This is achieved without sacrifice of picture quality. The standard is of particular appeal to applica- tions that allow higher processing delays, for instance in media storage and broadcasting. The compression efficiency achieved by the H.264/AVC coding technique is attributed to a number Manuscript received September 4, 2006; revised March 14, 2007. This paper was recommended by Associate Editor I. Ahmad. A. C. W. Yu is with the Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, U.K. (e-mail: andycyu@imperial. ac.uk). G. R. Martin and H. Park are with the Department of Computer Science, University of Warwick, Coventry CV4 7AL, U.K. (e-mail: grm@dcs.warwick. ac.uk; heechan@dcs.warwick.ac.uk). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2007.913970 Fig. 1. Two types of block decompositions for motion estimation: macroblock- type (top) and subblock-type (bottom). of advanced coding features, for example, multi-mode selec- tion, Lagrangian rate-distortion optimization, integer discrete cosine transform (DCT) and quantization and context-based entropy coding. Of all the new features introduced, multi-mode selection for inter-frame coding requires the most computa- tion, accounting for 60%–80% of the entire encoding time. Inter-frame mode selection is a combination of block-based motion estimation and rate-distortion optimization. Blocks are categorized as macroblocks and subblocks. Macroblocks may be of size, 16 16, 16 8, and 8 16 pixels. Macroblocks may be decomposed into subblocks, for example the type, comprises a number of smaller partition sizes ranging from 4 4 to 8 8 pixels. Fig. 1 illustrates the partition sizes available for inter-frame coding. The selection of block decomposition structures, as well as other hybrid coding techniques, provides a set of options/modes for the encoder. The possible prediction modes [6] for inter- frame coding are (1) where SKIP is the mode that directly copies the content of the macroblock in the same position in the reference frame (de- noted as the co-located macroblock), without the need for mo- tion compensation. In contrast, the INTRA mode utilizes neigh- boring pixels to perform a spatial prediction instead of using motion estimation. In the H.264/AVC standard, a Lagrangian rate-distortion optimization process is employed to select the 1051-8215/$25.00 © 2008 IEEE