Low Power H.264 Deblocking Filter Hardware Implementations
Mustafa Parlak and Ilker Hamzaoglu
Abstract — In this paper, we present two efficient and low
power H.264 deblocking filter (DBF) hardware
implementations that can be used as part of an H.264 video
encoder or decoder for portable applications. The first
implementation (DBF_4x4) starts filtering the available edges
as soon as a new 4x4 block is ready by using a novel edge
filtering order to overlap the execution of DBF module with
other modules in the H.264 encoder/decoder. Overlapping the
execution of DBF hardware with the execution of the other
modules in the H.264 encoder/decoder improves the
performance of the H.264 encoder/decoder. The second
implementation (DBF_16x16) starts filtering the available
edges after a new 16x16 macroblock is ready. Both DBF
hardware architectures are implemented in Verilog HDL and
both implementations are synthesized to 0.18 μm UMC
standard cell library. Both DBF implementations can work at
200 MHz and they can process 30 VGA (640×480) frames per
second. DBF_4×4 and DBF_16×16 hardware
implementations, excluding on-chip memories, are synthesized
to 7.4 K and 5.3 K gates respectively. These gate counts are
the lowest among the H.264 DBF hardware implementations
presented in the literature. Our hardware implementations are
more cost effective solutions for portable applications.
DBF_16x16 has 36% less power consumption than DBF_4x4
on a Xilinx Virtex II FPGA on an Arm Versatile PB926EJ-S
development board. Therefore, DBF_4×4 hardware can be
used in an H.264 encoder or decoder for which the
performance is more important, whereas DBF_16×16
hardware can be used in an H.264 encoder or decoder for
which the power consumption is more important.
1
Index Terms — H.264, Video Coding, Deblocking Filter,
Hardware Implementation, FPGA, Low Power.
I. INTRODUCTION
Video compression systems are used in many commercial
products, from consumer electronic devices such as digital
camcorders, cellular phones to video teleconferencing
systems. These applications make the video compression
systems an inevitable part of many commercial products. To
improve the performance of video compression systems,
recently, H.264 / MPEG4 Part 10 video compression standard,
offering significantly better video compression efficiency than
previous standards, is developed with the collobaration of ITU
and ISO standardization organizations.
1
This research was supported in part by the Scientific and Technological
Research Council of Turkey (TUBITAK) under the contract 106E153.
M. Parlak is with the Department of Electronics Engineering, Sabanci
University, Istanbul 34956, Turkey (e-mail: mparlak@su.sabanciuniv.edu).
I. Hamzaoglu is with the Department of Electronics Engineering, Sabanci
University, Istanbul 34956, Turkey (e-mail: hamzaoglu@sabanciuniv.edu).
The video compression efficiency achieved in H.264
standard is not a result of any single feature but rather a
combination of a number of encoding tools. As it is shown in
the top level block diagrams of an H.264 encoder and decoder
in Fig. 1 and 2, one of these tools is the adaptive deblocking
filter (DBF) algorithm [1, 2, 3, 4]. DBF is applied to each
Macroblock (MB), a 16×16 pixel array, after inverse
quantization and inverse transform. DBF improves the visual
quality of decoded frames by reducing the visually disturbing
blocking artifacts and discontinuities in a frame due to coarse
quantization of MBs and motion compensated prediction.
Since the filtered frame is used as a reference frame for
motion-compensated prediction of future frames, DBF also
increases coding efficiency resulting in bit rate savings [4].
The DBF algorithm used in H.264 standard is more
complex than the DBF algorithms used in previous video
compression standards. First of all, H.264 DBF algorithm is
highly adaptive and applied to each edge of all the 4×4 luma
and chroma blocks in a MB. Second, it can update 3 pixels in
each direction that the filtering takes place. Third, in order to
decide whether the DBF will be applied to an edge, the related
pixels in the current and neighboring 4×4 blocks must be read
from memory and processed. Because of these complexities,
the DBF algorithm can easily account for one-third of the
computational complexity of an H.264 video decoder [4].
In this paper, we present two efficient and low power H.264
DBF hardware implementations that can be used as part of an
H.264 video encoder or decoder for portable applications [5,
6]. The first implementation (DBF_4×4) starts filtering the
available edges as soon as a new 4x4 block is ready by using a
novel edge filtering order. The second implementation
(DBF_16×16) starts filtering the available edges after a new
16x16 MB is ready.
The execution of DBF_4×4 hardware can be overlapped
with the execution of the other modules in an H.264
encoder/decoder much more than the execution of
DBF_16×16 hardware can be overlapped with the execution
of the other modules. Overlapping the execution of DBF
hardware with the execution of the other modules in the H.264
encoder/decoder improves the performance of the H.264
encoder/decoder. However, because of the nature of the DBF
algorithm, control unit and address generation of DBF_16×16
hardware is simpler, therefore DBF_16x16 hardware has less
area and consumes less power than DBF_4×4 hardware.
Contributed Paper
Manuscript received March 28, 2008 0098 3063/08/$20.00 © 2008 IEEE
808 IEEE Transactions on Consumer Electronics, Vol. 54, No. 2, MAY 2008