1376 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 11, NOVEMBER 2006 High-Throughput Architecture for H.264/AVC CABAC Compression System Roberto R. Osorio and Javier D. Bruguera, Member, IEEE Abstract—New image and video coding standards have pushed the limits of compression by introducing new techniques with high computational demands. The Advanced Video Coder (ITU-T H.264, AVC MPEG-4 Part 10) is the last international standard, which introduces new enhanced features that require new levels of performance. Among the new tools present in AVC, the con- text-based binary arithmetic coder (CABAC) offers significant compression advantage over baseline entropy coders. CABAC is meant to be used in AVC’s Main and High Profiles, which target broadcast and video storage and distribution of standard and high-definition contents. In these applications, hardware accel- eration is needed as the computational load of CABAC is high, challenging programmable processors. Moreover, rate-distortion optimization (RDO) increases CABAC’s load by two orders of magnitude. In this paper, we present a fast and new architecture for arithmetic coding adapted to the characteristics of CABAC, including optimized use of memory and context managing and fast processing able to encode more than two symbols per cycle. A maximum processing speed of 185 MHz has been obtained for 0.35 , able to encode high quality video in real time. Some of the proposed optimization may also be applied to software implementations obtaining significant improvements. Index Terms—Application specific integrated circuits (ASICs), arithmetic codes, entropy codes, H.264/AVC, video coding. I. INTRODUCTION T HE new H.264/AVC video coding standard provides a significant compression gain over previous standards [1]. The advanced video coder (AVC) will be present in the new emerging high-definition DVD formats, digital video broadcast, and some HDTV and pay-per-view systems. AVC implements a set of novel tools, including but not restricted to quarter-of-sample accurate motion compensation, multiple ref- erence pictures, variable block-size motion compensation and transformation, spatial prediction for intra coding, in-the-loop de-blocking filter, and context-adaptive arithmetic coding. The use of the tools is organized in profiles. The Baseline Profile targets low-complexity scenarios such as video confer- encing and mobile video. The Extended Profile is intended to be used for video streaming. For high-quality video storage and distribution, Main and High Profiles were defined. The Main Profile focuses on high quality and high-efficiency compression, while High Profile allows using higher bit depth and color fi- delity. This study targets these two latter profiles. Manuscript received November 25, 2005; revised March 20, 2006. This work was supported in part by the Ministry of Science and Technology of Spain under Contract MEC-FEDER TIN2004-07797-C02. This paper was recommended by Associate Editor L.-G. Chen. The authors are with the Department of Electronics and Computer Science, University of Santiago de Compostela, Santiago de Compostela 15782, Spain (e-mail: roberto@dec.usc.es; bruguera@dec.usc.es). Digital Object Identifier 10.1109/TCSVT.2006.883508 Fig. 1. Number of symbols processed per second 25 fps. (a) Without RDO. (b) With RDO. A context-based adaptive binary arithmetic coder (CABAC) [2] is used in H.264/AVC. CABAC offers compression results that are 10%–15% better than those obtained with the baseline CAVLC (context adaptive variable length coder) entropy coder. For Main and High Profiles, CABAC is preferred as the entropy compression method. However, the computational demands are high, challenging general purpose processors in high-quality scenarios. Rate-distortion optimization (RDO) offers enhanced com- pression at the cost of a larger number of operations. Several encoding modes are tried in order to find the one that offers the best compression–distortion balance. However, compressing the suboptimal modes increases the workload by two orders of magnitude. The number of operations increases with picture size and quality. Fig. 1(a) shows the number of symbols/s actually en- coded for different video sequences ranging from QCIF 4:2:0 to High Definition 4:2:2. As can be seen, millions of symbols/s must be processed. Fig. 1(b) shows larger numbers when RDO is used. New techniques [3], [4] have been recently proposed to simplify RDO, reducing the number of encoding modes that are tried and, hence, the workload for CABAC. However, the com- putational load is expected to remain high and hardware accel- eration is needed. This paper is structured as follows. Section II introduces CABAC and arithmetic coding. Section III presents the pro- posed algorithm implementation. Section IV focuses on the 1051-8215/$20.00 © 2006 IEEE