Low-Complexity Multi-Purpose IP Core for Quantized Discrete Cosine and Integer Transform Chi-Chia Sun†, Philipp Donner and J¨ urgen G¨ otze Information Processing Lab, Dortmund University of Technology, Germany †e-mail: chichia.sun@tu-dortmund.de Abstract— In this paper a low-complexity and highly- integrated IP Core for image/video transformations is presented. It is possible to perform quantized 8×8 DCT and quantized 8×8/4×4 integer transforms on the presented reconfigurable architecture using only shift and add operations. The XVID experimental and FPGA synthesis results show that the proposed architecture not only achieves multiplierless video transforma- tions efficiently, but also retains good transformation quality. It is worth noticing that the proposed IP Core is very suitable for low-complexity and multi-purpose Video CODECs in SoC designs. Index Terms— DCT, QDCT, integer transform, CORDIC, DCIT, QDCIT, low power, reconfigurable architecture, FPGA I. I NTRODUCTION As the demand for multi-function devices has been grow- ing explosively in recent years, new challenges have been posed in some emerging design issues, such as low-power, quality awareness and multi-standard integration. Therefore, we present a multi-standard image/video transformation archi- tecture. So far, the Discrete Cosine Transform (DCT) is the main component of many modern Image/Video compression standards and applications (e.g., JPEG, MPEG4, H.26X and so on [1], [2]). Recently, the DCT is replaced by an integer transform in the H.264-Standard which uses block sizes of 4 × 4 Pixels. For adding profiles for HD-Videos, an integer transform using block sizes of 8 × 8 has been added [3]. In this paper we present the design of a reconfigurable transformation IP core which can perform the multiplierless 1- D 8-point DCT and the 1-D 8-point/4-point integer transforms for multi-standard Video CODECs by reusing the architecture of our previous CORDIC based Loeffler DCT (CLDCT) [4]. In this work we have successfully implemented a low-power and high-quality CLDCT which not only reduces the compu- tational complexity from 11 multiplier and 29 add operations to 38 add and 16 shift operations but also obtains a transfor- mation quality as good as the original Loeffler DCT. Here, we present the integration of 8-point/4-point integer transforms in this CLDCT, such that a reconfigurable architecture for Discrete Cosine and Integer Transform (DCIT) is obtained, which can be used for multi-functional SoC designs. Incorporating the quantization into the DCT transformation, resulting in the Quantized DCT (QDCT), has been another important issue to improve the computational complexity. In the literature, Alen Docef et al. [5] proposed a joint implementation of Chen’s DCT [6] and Quantization (i.e., a conventional QDCT design). Later, Hanli Wang et al. [7] merged the quantization process represented by a quantization matrix that has variable quantization step sizes into Chen’s DCT. They called it Novel QDCT (NQDCT). However, both of these QDCT designs still need multipliers to perform DCT and Quantization. Even if the multiplier could be replaced by Canonical Signed Digit (CSD) representation, the QDCT and NQDCT still need more than 300 add operations in the worst case. In order to realize an efficient QDCT architecture, we present a CORDIC Scaler (C-Scaler) by involving limited CORDIC compensation iterations. This C-Scaler has four stages and requires 8 add and 10 shift operations. The combi- nation of two 1-D DCITs, a row-column transposition memory and four C-Scalers forms a 2-D Quantized DCIT (QDCIT) which requires only 120 add and 80 shift operations. The final architecture can not only perform multiplierless 8×8 QDCT but contain reconfigurable modules such that it can also perform 8×8/4×4 quantized integer transforms. Furthermore, it also retains the good transformation quality compared to the Loeffler DCT in terms of PSNR results. This paper is organized as follows. Section II briefly intro- duces the algorithms of the DCT, CLDCT and the proposed reconfigurable DCIT for multiple transformations. In Section III, we will further present the proposed C-Scaler architecture for the multiplierless QDCIT transformation. The experimental and synthesis results are shown in Section IV. Section V concludes this paper. II. DCT ALGORITHMS A. The DCT Background An 8-point 1-D DCT can transform 8 samples from spatial domain f (x) into frequency domain F (k) as follows: F (k)= 1 2 C(k) 7 ∑ x=0 f (x) cos[ (2x+1)kπ 16 ] C(k)= 1 √ 2 if k = 0 1 otherwise. (1) A commonly used approach to construct 2-D DCT is the row-column decomposition method. The decomposition per- forms a row-wise 1-D transform followed by another column- wise 1-D transform with the intermediate transposition. This decomposition approach has two advantages. First, the com- putational complexity is significantly reduced. Second, the