IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 11, NOVEMBER 2011 1719 A Two-Level Classification-Based Approach to Inter Mode Decision in H.264/AVC Eduardo Mart´ ınez-Enr´ ıquez, Student Member, IEEE, Amaya Jim´ enez-Moreno, Miguel ´ Angel-Pell´ on, and Fernando D´ ıaz-de-Mar´ ıa, Member, IEEE Abstract —The H.264/AVC standard achieves a high coding efficiency compared to previous standards. However, this gain is accomplished at great computational cost, with mode decision being one of the most demanding subsystems. In this paper, a two-level classification-based approach to the inter mode decision problem is proposed. A first classifier detects SKIP/Direct modes, while a second one is able to decide whether to use a large (16 × 16, 16 × 8, and 8 × 16) or a small mode (8 × 8, 8 × 4, 4 × 8, and 4 × 4). The suggested classifiers are binary and linear, and the input features in the classifiers have been carefully selected. A novel cost function that pays more attention to the most critical samples during the classifier training process has been designed. The experimental results show an average computational savings of 60% of the total encoding time with respect to JM10.2 over a comprehensive variety of sequences and formats. This is achieved with negligible degradation in rate- distortion performance and compares favorably with state-of- the-art fast mode decision methods. Furthermore, the proposed method has been successfully assessed at different levels of complexity reduction. Index Terms—H.264/AVC, inter mode decision, low complexity, rate-distortion optimization. I. Introduction T HE latest H.264/AVC video coding standard of the joint video team (JVT), formed by ISO/IEC MPEG and ITU-T VCEG, was widely adopted within a few years of completion of the standard. H.264/AVC achieves a higher compression ef- ficiency than previous video coding standards such as MPEG- 2/H.262, H.263, and MPEG-4 part 2, and it is used in a variety of applications including Blu-ray, TV broadcasting, IPTV, mo- bile multimedia and streaming services, video conferencing, consumer video cameras using the advanced video codec high definition recording format, and personal media players. The H.264/AVC encoder is composed of a set of subsystems that carry out different tasks such as prediction, transformation of the residual block, quantization of the transformed coef- ficients, and entropy coding. The prediction can be formed Manuscript received May 18, 2010; revised November 21, 2010; accepted February 26, 2011. Date of publication May 2, 2011; date of current version November 2, 2011. This paper was recommended by Associate Editor R. L. de Queiroz. E. Mart´ ınez-Enr´ ıquez, A. Jim´ enez-Moreno, and F. D´ ıaz-de-Mar´ ıa are with the Universidad Carlos III de Madrid, Madrid 28911, Spain (e-mail: emen- riquez@tsc.uc3m.es; ajimenez@tsc.uc3m.es; fdiaz@tsc.uc3m.es). M. ´ Angel-Pell´ on is with the Engineering Department, VAIO of Europe, Zaventem 1935, Belgium (e-mail: miguelpellon@gmail.com). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2011.2134010 from one or two reference pictures that are different from the current one (inter prediction) or from samples in the current slice that have been previously encoded, decoded, and reconstructed (intra prediction). In the case of inter prediction, motion estimation (ME) and motion compensation processes play an outstanding role. In an attempt to reduce the amount of energy in the motion-compensation residual, quarter-pixel accuracy motion vectors (MV) and several reference frames and block sizes (modes) can be used to form the prediction. A rate-distortion optimization (RDO) method has been developed for the ME process and for the intra and inter mode decision (MD) in order to choose the most efficient representa- tion of each macroblock (MB) according to rate and distortion considerations. Nevertheless, reducing the complexity of ME and MD in a H.264/AVC encoder has become an important issue as a result of the great computational complexity required to determine the optimal MB representation. This paper focuses on inter MD, which entails a high percentage of encoder complexity [1]. H.264/AVC offers a wide set of modes for motion compensation. An MB can be partitioned into blocks of 16 ×16, 16 ×8, 8 ×16, or 8 ×8 pixels for inter coding. Each 8 × 8 block, called a submacroblock (subMB), can be further divided into 8 ×4, 4 ×8, or 4 ×4 pixel blocks. The possible MB and subMB partitions are illustrated in Fig. 1. Direct mode (in B slices) and SKIP mode (in P slices) are particular cases of the 16 × 16 MB partition. In Direct mode, no MV is transmitted. In SKIP mode, neither the residual signal, MV, nor reference index is transmitted. Low- detailed or little-movement frame areas can usually be encoded very efficiently using SKIP or Direct modes. Likewise, the Direct 8 × 8 mode is a particular case of 8 × 8 MB partition. From now on, we refer to this set of modes as inter modes. This paper is an extension of the work presented in [2] by the same authors. We propose a two-level classification-based method (TLCM) in order to reduce encoder complexity, while maintaining the quality as close as possible to the full search (FS) approach. This complexity saving is achieved by notably reducing the number of inter modes evaluated. The proposed method is able to decide the correct mode for any specific video content and codec configuration (QP, video format, frame rate, and so on) by going through a two-level classification structure. The input features to each classifier have been carefully selected and a procedure for training the classifiers has been developed, giving rise to a robust and simple approach. 1051-8215/$26.00 c 2011 IEEE