IEEE Transactions on Consumer Electronics, Vol. 50, No. 1, FEBRUARY 2004 Contributed Paper Manuscript received November 11, 2003 0098 3063/04/$20.00 © 2004 IEEE 180 Fast Multiplication-free QWDCT for DV Coding Standard Antonio Silva, Paulo Gouveia, and Antonio Navarro, Member, IEEE Abstract This paper deals with fast computation of some operations included in the digital video (DV) coding standard. The proposed solution converts floating-point arithmetic operations into integer arithmetic operations, replacing highly computational demanding blocks such as discrete cosine transform (DCT), weighting (W) and quantization (Q) by fast integer calculations using only shifts and additions. The overall computational complexity was reduced by 73% in comparison to a floating-point implementation. Our solution is suitable to be programmed into any fixed-point arithmetic processor decreasing the consumer equipment cost. This solution is still compatible with the standard in terms of the DCT precision requirements 1 . Index Terms Digital Video, Discrete Cosine Transform, Fixed-point Processing, Quantization. I. INTRODUCTION In the last decade, several compression algorithms have been standardized by international organizations like ISO, ITU and IEC. Video recording requires high picture quality and encoding schemes allowing random access. In majority, professional digital recording technologies have been defined by private companies. However, it evolved from the professional side like D-series and Betacam to the consumer market by the introduction of the IEC DV Standard [1], [2] and some of its variations, DVCAM [3] and DVCPRO. Rich video contents can then be obtained from the consumers, saved on servers and shared by millions of users all over the world. The DV standard was actually developed to be a high- performance successor of the existing consumer analogue formats (VHS and Hi-8) for video/TV recording in mini tapes and is emerging as a popular alternative in digital video storage. It has a compression ratio of 3:1 to 5:1 and it is suitable for devices such as digital camcorders, VCRs and video editors. The introduction of the DV standard was mainly motivated by the need of small size digital camcorders with some constrains such as recording mechanism size, cassette size, power consumption and consumer price. The DV standard is a coding system with fixed bit rate of 25 Mbit/s (compression ratio of 5:1), uses intraframe compression and uses the DCT to remove redundancy from pixel block data. Once the DCT is computed, the coefficients are quantized and entropy encoded (Variable Length Coding, VLC). In order to have a fixed compressed data rate of 25 Mbit/s, DV uses a feed-forward video compression scheme which consists of selecting, according to the “activity” of the DCT block, the appropriate quantization table/step, conducting after entropy coding to a data stream close to the ideal fixed 1 The authors are with the Telecommunications Institute, University of Aveiro, 3810-Aveiro, Portugal (e-mail: navarro@av.it.pt). data rate. Being the central part of many image coding applications, all DCT based video algorithms or standards will benefit from a DCT fast computation. Several floating-point DCT calculation algorithms have been proposed, and usually can be classified into two classes: indirect and direct methods. The former computes the DCT through a FFT or other transforms and the latter through matrix factorization or recursive computation. When direct methods are chosen to calculate (NxN)-point 2- D DCTs, the conventional approach follows the row-column method which requires 2N sets of N-point 1-D DCTs. In [4], [5], the authors propose two 2-D DCT recursive algorithms based on fast 1-D DCT algorithms of [6], [7]. However, true 2-D techniques are more efficient than the conventional row- column approach. A direct 2-D method for the 2-D DCT based on polynomial transform techniques was provided by Duhamel and Guillemot [8]. Feig and Winograd [9] present a matrix factorization algorithm of 2-D DCT matrix. In [10], Vetterli propose an indirect method to calculate 2-D DCT by mapping it into a 2-D DFT plus a number of rotations. The 2-D DFT was computed through polynomial transform techniques. From a literature review, as far as we know the fastest 2-D DCT calculation [11] is due to Feig-Winograd’s algorithm mentioned above. Given an image with integer intensity values, the DCT transforms them into floating-point numbers (DCT coefficients), whose computational complexity can not be neglected. Efficient implementation of the DCT requires fixed- point implementations resulting in less silicon area and power consumption. However, in fixed-point implementation, there is an inherent accuracy problem due to finite word length. In this paper we propose a joint implementation of the blocks, DCT, weighting and quantization (QWDCT) with the advantage of reducing considerable the computational complexity associated with these operations. For comparison purposes, we developed a DV reference software. Those blocks have a computational complexity of 38% relatively to the complete DV coding algorithm. With our QWDCT implementation, the overall computational complexity of the DV reference encoder was reduced of 27%. Explicit quantization of the DCT coefficients is avoided including the quantization values into the DCT computation. We propose an integer multiplication-free algorithm to compute the QWDCT through the replacement of multiplications by shifts and additions. In order to reduce the number of operations needed to perform the DCT algorithm, the multiplicative values are approximated with several precisions. These approximations reduce the precision of the DCT, still maintaining compatibility with the standard specifications and resulting in a negligible subjective and objective video degradation.