1172 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 9, SEPTEMBER 2006 An Area-Efficient Variable Length Decoder IP Core Design for MPEG-1/2/4 Video Coding Applications Chih-Da Chien, Member, IEEE, Keng-Po Lu, Yu-Min Chen, Jiun-In Guo, Member, IEEE, Yuan-Sun Chu, and Ching-Lung Su Abstract—This paper proposes an area-efficient variable length decoder (VLD) IP core design for MPEG-1/2/4 video coding applications. The proposed IP core exploits the parallel numerical matching in the MPEG-1/2/4 entropy decoding to achieve high data throughput rate in terms of limited hardware cost. This feature not only improves the performance of VLD, but also fa- cilitates reducing the power consumption through lowering down the supply voltage while maintaining enough data throughput rate. Moreover, we propose a partial combinational component enabling approach for minimizing the power consumption of the proposed design. Based on 0.18- m CMOS technology, the implementation results show that the proposed IP core operates at 125-MHz clock frequency with the cost of 13 105 gates. In addition, the power consumption of the proposed design reaches 163.4 W operated at 12.5 MHz with 0.9-V supply voltage, which is fast enough for MPEG-1/2/4 real-time decoding on 4CIF video@30 Hz. Compared to the existing designs, the proposed IP core possesses both higher data throughput and less hardware cost. Index Terms—Low-power design, MPEG, variable length decoder (VLD). I. INTRODUCTION V ARIABLE length coding (VLC) is a widely used lossless data compression technique that has been found in many image and video coding systems. In addition, it is often ap- plied together with other lossy image compression techniques to increase the data compression rate. The main idea for vari- able length coding is to minimize the average codeword length. Shorter codewords are assigned to frequently occurring data and longer codewords are assigned to infrequently occurring data. In each MPEG video coding standard, most of the coding informa- tion is encoded as variable length coded bit stream. Therefore, it is necessary to decode VLC codes fast enough for meeting the real-time demands of high quality video coding systems like DTV or HDTV. There are certain factors influencing the performance im- provement on the variable length decoding (VLD) process. One is that VLC codes do not have any explicit word boundaries according to the VLC property. Thus, in the VLD process we Manuscript received June 19, 2005; revised May 2, 2006. This paper was recommended by Associate Editor K.-H.Tzou. C.-D. Chien and K.-P. Lu are with the Department of Computer Science and Information Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan, R.O.C. Y.-M. Chen and Y.-S. Chu are with the Department of Electrical Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan, R.O.C. J.-I. Guo is with the Department of Computer Science and Information Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan, R.O.C. (e-mail: jiguo@cs.ccu.edu.tw). C.-L. Su is with the Department of Electronics Engineering, National Yunlin University of Science Technology, YunLin 640, Taiwan, R.O.C. Digital Object Identifier 10.1109/TCSVT.2006.881873 do not know in advance the start of the next VLC code until the current one is decoded. Because of this fact, the length of the decoded VLC code in the current cycle must be fed back in the next cycle to determine the start of next VLC code. This induces a feedback loop exists in the VLD process during two adjacent cycles, which limits the speed improvement of the VLD process. Apart from the speed limitation, low power design require- ment is also important in many portable video coding applica- tions. In addition, the increasingly demanding of multiple media players motivates the realization of MPEG-1/2/4 bit stream pro- cessing core design [1] for providing high flexibility in playing back different video streams based on the same hardware. In order to resolve the above-mentioned problems, we propose an efficient VLD IP core for MPEG-1/2/4 video coding systems. The proposed IP core exploits the parallel codeword detection [1] in the MPEG-1/2/4 entropy decoding to achieve high per- formance in terms of limited hardware cost. In addition, we also propose two design approaches called parallel numerical matching (PNM) and partial combinational component enabling (PCCE) to both increase the VLC decoding performance and reduce the power consumption. These approaches can also be applied to CAVLC decoding for MPEG-4 AVC/H.264. The im- plementation results show that the proposed IP core operates at 125-MHz clock frequency with the cost of 13 105 gates based on a 0.18- m CMOS technology. In addition, the power con- sumption of the proposed design reaches 163.4 W operated at 12.5 MHz with 0.9-V supply voltage, which is fast enough for MPEG-1/2/4 real-time decoding on 4CIF video@30 Hz. Com- pared to the designs in [2], [3], and [4], the proposed design respectively reduces 84%, 55%, and 62% of hardware cost in terms of normalized silicon area based on the same scale of CMOS technology. The rest of this paper is organized as follows. Section II de- scribes some previous VLD design approaches proposed in the literature. Section III illustrates the proposed design approaches for MPEG-1/2/4 VLD IP core design. Section IV shows the im- plementation, performance evaluation, and comparison of the proposed IP core with others. Finally, we conclude this paper in Section V. II. PREVIOUS WORKS A. Difficulty in Designing a High-Throughput VLD Fig. 1 shows the block diagram of a commonly used entropy decoding system in MPEG video coding standards, which in- cludes VLD and run-length decoding. The VLD process is to decode the VLC codes as RunLevel pairs and then transmit the data to run-level decoder (RLD) to reconstruct the discrete co- sine transform data. To achieve high-quality video services, the 1051-8215/$20.00 © 2006 IEEE