A Hardware Implementation of Real-Time Video Deblocking Using Shifted Thresholding Martin Hansen Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada Email: mdbhanse@uwaterloo.ca Alexander Wong Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada Email: a28wong@uwaterloo.ca William Bishop Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada Email: wdbishop@uwaterloo.ca Abstract— Video compression has become very important as demand has increased for the storage and transmission of digital video content. Popular video compression schemes like MPEG encoding make use of block-transform coding techniques which are susceptible to blocking artifacts. Recently, an efficient deblocking algorithm based on the concept of shifted thresholding has been proposed. This algorithm uses only integer arithmetic and replaces division operations with bit shifting. This paper proposes a new hardware architecture for the implementation of video deblocking using shifted thresholding. A prototype system for high performance video deblocking using a FPGA (field programmable gate array) board is described. The prototype system leverages the reduced hardware complexity of the shifted thresholding algorithm to cost-effectively implement video de- blocking on a FPGA board. I. I NTRODUCTION With the ever increasing need for efficient storage and transmission of digital video content, video compression has become an active area of research. Video compression is essential for applications ranging from high definition video broadcasting to the wireless transmission of video content to portable entertainment systems. Popular video compression schemes such as MPEG [1] and recent video compression schemes such as H.264/AVC [2] make use of block-transform coding, where blocks of pixels are processed independently to reduce computational and storage requirements. A significant drawback of block-transform video coding is that blocking artifacts are introduced at block boundaries. These artifacts noticeably degrade video quality, particularly if the video content is compressed at a high compression rate. To improve video quality, a process known as video deblocking is used to reduce the impact of blocking artifacts. A large number of video and image deblocking methods have been introduced. These methods have been categorized [3] as follows: 1) Projections onto convex sets (POCS) methods, 2) Spatial block boundary filtering methods, 3) Wavelet filtering methods, 4) Statistical modeling methods, 5) Constrained optimization methods, and 6) Shifted transform methods. Traditionally, methods based on spatial block boundary filtering have been used in real-time video decoding due to their low computational complexity. However, interest has grown recently into the use of shifted transform methods. Such methods typically deliver improved deblocking quality. However, the computational complexity of shifted transform methods have traditionally been very high due to the need for a large number of floating-point calculations. Recently, an efficient deblocking algorithm based on the concept of shifted thresholding was proposed [3]. This algo- rithm uses only a fraction of the computations required by traditional shifted transform methods. Furthermore, it requires only integer computations and uses bit shifting to replace division operations. Despite these simplifications the algorithm still achieves image quality that is competitive with other methods in its class. The algorithm is also ideal for imple- mentation using inexpensive hardware. This paper presents an efficient hardware architecture for video deblocking using shifted thresholding. The proposed architecture is described and explained in detail in Section 2. The hardware complexity of the proposed architecture is analyzed in Section 3. A prototype design is presented in Section 4 along with experimental results. Conclusions are drawn in Section 5. II. PROPOSED ARCHITECTURE The proposed hardware architecture implements a shifted thresholding algorithm for video deblocking [3] on an Altera DE2 board [4]. The shifted thresholding algorithm transforms an initial decompressed image into a deblocked, decompressed image. The algorithm performs six distinct operations as illus- trated in Fig. 1. The proposed hardware architecture assumes greyscale bitmap images of dimensions 640×480, with each greyscale pixel represented by 8 bits. However, the architecture could be easily modified to support larger images and larger colour representations. A 6 stage pipeline architecture was chosen for the hardware design. It should be noted that in the interest of hardware optimization, the pipeline stages deviate slightly from the 6 distinct operations described previously. The first pipeline stage loads image data from memory in blocks of 8×8 pixels, 0840-7789/07/$25.00 ©2007 IEEE 28