1376 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 11, NOVEMBER 2006 High-Throughput Architecture for H.264/AVC CABAC Compression System Roberto R. Osorio and Javier D. Bruguera, Member, IEEE Abstract—New image and video coding standards have pushed the limits of compression by introducing new techniques with high computational demands. The Advanced Video Coder (ITU-T H.264, AVC MPEG-4 Part 10) is the last international standard, which introduces new enhanced features that require new levels of performance. Among the new tools present in AVC, the con- text-based binary arithmetic coder (CABAC) offers signiﬁcant compression advantage over baseline entropy coders. CABAC is meant to be used in AVC’s Main and High Proﬁles, which target broadcast and video storage and distribution of standard and high-deﬁnition contents. In these applications, hardware accel- eration is needed as the computational load of CABAC is high, challenging programmable processors. Moreover, rate-distortion optimization (RDO) increases CABAC’s load by two orders of magnitude. In this paper, we present a fast and new architecture for arithmetic coding adapted to the characteristics of CABAC, including optimized use of memory and context managing and fast processing able to encode more than two symbols per cycle. A maximum processing speed of 185 MHz has been obtained for 0.35 , able to encode high quality video in real time. Some of the proposed optimization may also be applied to software implementations obtaining signiﬁcant improvements. Index Terms—Application speciﬁc integrated circuits (ASICs), arithmetic codes, entropy codes, H.264/AVC, video coding. I. INTRODUCTION T HE new H.264/AVC video coding standard provides a signiﬁcant compression gain over previous standards [1]. The advanced video coder (AVC) will be present in the new emerging high-deﬁnition DVD formats, digital video broadcast, and some HDTV and pay-per-view systems. AVC implements a set of novel tools, including but not restricted to quarter-of-sample accurate motion compensation, multiple ref- erence pictures, variable block-size motion compensation and transformation, spatial prediction for intra coding, in-the-loop de-blocking ﬁlter, and context-adaptive arithmetic coding. The use of the tools is organized in proﬁles. The Baseline Proﬁle targets low-complexity scenarios such as video confer- encing and mobile video. The Extended Proﬁle is intended to be used for video streaming. For high-quality video storage and distribution, Main and High Proﬁles were deﬁned. The Main Proﬁle focuses on high quality and high-efﬁciency compression, while High Proﬁle allows using higher bit depth and color ﬁ- delity. This study targets these two latter proﬁles. Manuscript received November 25, 2005; revised March 20, 2006. This work was supported in part by the Ministry of Science and Technology of Spain under Contract MEC-FEDER TIN2004-07797-C02. This paper was recommended by Associate Editor L.-G. Chen. The authors are with the Department of Electronics and Computer Science, University of Santiago de Compostela, Santiago de Compostela 15782, Spain (e-mail: roberto@dec.usc.es; bruguera@dec.usc.es). Digital Object Identiﬁer 10.1109/TCSVT.2006.883508 Fig. 1. Number of symbols processed per second 25 fps. (a) Without RDO. (b) With RDO. A context-based adaptive binary arithmetic coder (CABAC) [2] is used in H.264/AVC. CABAC offers compression results that are 10%–15% better than those obtained with the baseline CAVLC (context adaptive variable length coder) entropy coder. For Main and High Proﬁles, CABAC is preferred as the entropy compression method. However, the computational demands are high, challenging general purpose processors in high-quality scenarios. Rate-distortion optimization (RDO) offers enhanced com- pression at the cost of a larger number of operations. Several encoding modes are tried in order to ﬁnd the one that offers the best compression–distortion balance. However, compressing the suboptimal modes increases the workload by two orders of magnitude. The number of operations increases with picture size and quality. Fig. 1(a) shows the number of symbols/s actually en- coded for different video sequences ranging from QCIF 4:2:0 to High Deﬁnition 4:2:2. As can be seen, millions of symbols/s must be processed. Fig. 1(b) shows larger numbers when RDO is used. New techniques [3], [4] have been recently proposed to simplify RDO, reducing the number of encoding modes that are tried and, hence, the workload for CABAC. However, the com- putational load is expected to remain high and hardware accel- eration is needed. This paper is structured as follows. Section II introduces CABAC and arithmetic coding. Section III presents the pro- posed algorithm implementation. Section IV focuses on the 1051-8215/$20.00 © 2006 IEEE