Macroblock Classification for Video Encoder Complexity Management Yafan Zhao and Iain Richardson Image Communication Technology Group, The Robert Gordon University, Schoolhill, Aberdeen, UK, Email: 9905854@rgu.ac.uk and i.g.richardson@rgu.ac.uk ABSTRACT We describe a system for classifying macroblocks in order to reduce computational complexity in a block-based video encoder. Sequence statistics are used to predict macroblock type prior to coding, enabling selective computation of functions such as motion estimation, DCT and quantization. Results are presented for a “skip mode” prediction algorithm demonstrating that our approach can deliver substantial computational savings at the expense of a small reduction in rate-distortion performance. 1. Introduction Video CODECs based on the H.263 [1] and MPEG-4 [2] video coding standards are used in a wide range of applications. Software-only CODECs are becoming particularly popular, offering advantages such as flexibility, ease of upgrading and distribution. In real-time and/or power-constrained applications, the performance of a video CODEC may be limited by the amount of processing power available as well as, or rather than, the available transmission bandwidth. In a desktop video conferencing system, the CODEC runs on a general-purpose PC and has to share processing resources with other applications. In a mobile video handset, power consumption is closely related to processor utilisation and it may be necessary to restrict computational processing in order to maximise battery life. Current software video applications typically control processor utilisation by dropping frames during encoding, leading to intermittent and “jerky” motion in the decoded video sequence. Computational complexity can be a major constraint on coding performance. It is therefore important to develop methods of managing video CODEC computation. Previous work on reducing computational complexity of video CODECs has included many proposals for “fast seach” motion estimation algorithms; for example, the popular Nearest Neighbour Search [3] provides coding performance that is close to that of Full Search with greatly reduced complexity. The computational cost of this type of algorithm varies significantly depending on the scene characteristic of the video sequence. Several methods [4, 5] have been proposed to reduce the computational complexity of the Discrete Cosine Transform (DCT) by calculating a subset of the DCT coefficients. Applying these methods to all blocks in an image significantly reduces the quality of the decoded image and this has led to proposals for algorithms that selectively calculate the DCT based on sequence statistics [6]. In [7] we describe an algorithm that enables flexible and accurate management of DCT complexity in a software video encoder and a similar approach to motion estimation complexity is presented in [8]. The initial calculation of Sum of Absolute Differences (SAD) for zero motion vector is used to reduce motion estimation complexity in [9] and the authors report that this method performs well together with DCT computation reduction. The results presented in these papers show that variable-complexity algorithms can reduce computational complexity, often at the expense of increased distortion. Many coded macroblocks (MBs) in an inter-coded frame have zero motion vectors (MV) and/or quantized coefficients (QCoeff). If the probable mode of a macroblock can be predicted at an early stage of encoding, computationally intensive processing such as motion estimation, DCT/IDCT, quantization and entropy coding may be avoided for some MBs, saving considerable computational effort. In this paper we examine typical distributions of coded macroblocks and describe our approach to macroblock pre- classification. We describe an algorithm for “skip mode” prediction and present results that demonstrate its performance. 2. Macroblock Classification 2.1 Overview Four video sequences (“Carphone”, “Mother and Daughter”, “Foreman” and “Claire”) were encoded using the H.263 TMN-8 reference model (baseline mode) [10]. Coded MBs in P-pictures were categorised into four classes: “skip” (zero MV, no non-zero QCoeff), “MV=0” (zero MV, some non-zero QCoeff), “QCoeff=0” (non-zero MV, all zero QCoeff) and “other” (non- zero MV and non-zero QCoeff). Carphone 8.0% 12.0% 26.90% 36.40% 28.6% 21.0% 7.6% 12.0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Q=8 Q=12 Q=8 76.0% 59.8% 13.1% 16.50% 26.00% 26.80% 5.7% 11.2% 51.7% 8.40% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Claire Mother and Daughter Foreman Others QCoeff=0MB MV=0 MB Skipped MB (a) Carphone (b) Q=8 Figure 1: Distribution of four types of MBs