Low Power H.264 Deblocking Filter Hardware Implementations Mustafa Parlak and Ilker Hamzaoglu Abstract — In this paper, we present two efficient and low power H.264 deblocking filter (DBF) hardware implementations that can be used as part of an H.264 video encoder or decoder for portable applications. The first implementation (DBF_4x4) starts filtering the available edges as soon as a new 4x4 block is ready by using a novel edge filtering order to overlap the execution of DBF module with other modules in the H.264 encoder/decoder. Overlapping the execution of DBF hardware with the execution of the other modules in the H.264 encoder/decoder improves the performance of the H.264 encoder/decoder. The second implementation (DBF_16x16) starts filtering the available edges after a new 16x16 macroblock is ready. Both DBF hardware architectures are implemented in Verilog HDL and both implementations are synthesized to 0.18 μm UMC standard cell library. Both DBF implementations can work at 200 MHz and they can process 30 VGA (640×480) frames per second. DBF_4×4 and DBF_16×16 hardware implementations, excluding on-chip memories, are synthesized to 7.4 K and 5.3 K gates respectively. These gate counts are the lowest among the H.264 DBF hardware implementations presented in the literature. Our hardware implementations are more cost effective solutions for portable applications. DBF_16x16 has 36% less power consumption than DBF_4x4 on a Xilinx Virtex II FPGA on an Arm Versatile PB926EJ-S development board. Therefore, DBF_4×4 hardware can be used in an H.264 encoder or decoder for which the performance is more important, whereas DBF_16×16 hardware can be used in an H.264 encoder or decoder for which the power consumption is more important. 1 Index Terms — H.264, Video Coding, Deblocking Filter, Hardware Implementation, FPGA, Low Power. I. INTRODUCTION Video compression systems are used in many commercial products, from consumer electronic devices such as digital camcorders, cellular phones to video teleconferencing systems. These applications make the video compression systems an inevitable part of many commercial products. To improve the performance of video compression systems, recently, H.264 / MPEG4 Part 10 video compression standard, offering significantly better video compression efficiency than previous standards, is developed with the collobaration of ITU and ISO standardization organizations. 1 This research was supported in part by the Scientific and Technological Research Council of Turkey (TUBITAK) under the contract 106E153. M. Parlak is with the Department of Electronics Engineering, Sabanci University, Istanbul 34956, Turkey (e-mail: mparlak@su.sabanciuniv.edu). I. Hamzaoglu is with the Department of Electronics Engineering, Sabanci University, Istanbul 34956, Turkey (e-mail: hamzaoglu@sabanciuniv.edu). The video compression efficiency achieved in H.264 standard is not a result of any single feature but rather a combination of a number of encoding tools. As it is shown in the top level block diagrams of an H.264 encoder and decoder in Fig. 1 and 2, one of these tools is the adaptive deblocking filter (DBF) algorithm [1, 2, 3, 4]. DBF is applied to each Macroblock (MB), a 16×16 pixel array, after inverse quantization and inverse transform. DBF improves the visual quality of decoded frames by reducing the visually disturbing blocking artifacts and discontinuities in a frame due to coarse quantization of MBs and motion compensated prediction. Since the filtered frame is used as a reference frame for motion-compensated prediction of future frames, DBF also increases coding efficiency resulting in bit rate savings [4]. The DBF algorithm used in H.264 standard is more complex than the DBF algorithms used in previous video compression standards. First of all, H.264 DBF algorithm is highly adaptive and applied to each edge of all the 4×4 luma and chroma blocks in a MB. Second, it can update 3 pixels in each direction that the filtering takes place. Third, in order to decide whether the DBF will be applied to an edge, the related pixels in the current and neighboring 4×4 blocks must be read from memory and processed. Because of these complexities, the DBF algorithm can easily account for one-third of the computational complexity of an H.264 video decoder [4]. In this paper, we present two efficient and low power H.264 DBF hardware implementations that can be used as part of an H.264 video encoder or decoder for portable applications [5, 6]. The first implementation (DBF_4×4) starts filtering the available edges as soon as a new 4x4 block is ready by using a novel edge filtering order. The second implementation (DBF_16×16) starts filtering the available edges after a new 16x16 MB is ready. The execution of DBF_4×4 hardware can be overlapped with the execution of the other modules in an H.264 encoder/decoder much more than the execution of DBF_16×16 hardware can be overlapped with the execution of the other modules. Overlapping the execution of DBF hardware with the execution of the other modules in the H.264 encoder/decoder improves the performance of the H.264 encoder/decoder. However, because of the nature of the DBF algorithm, control unit and address generation of DBF_16×16 hardware is simpler, therefore DBF_16x16 hardware has less area and consumes less power than DBF_4×4 hardware. Contributed Paper Manuscript received March 28, 2008 0098 3063/08/$20.00 © 2008 IEEE 808 IEEE Transactions on Consumer Electronics, Vol. 54, No. 2, MAY 2008