LOW POWER DCT IMPLEMENTATION APPROACH FOR VLSI DSP PROCESSORS S. Masupe and T. Arslan The University of Edinburgh Department of Electronics and Electrical Engineering Mayﬁeld Rd., Edinburgh EH9 3JL, United Kingdom ABSTRACT This paper presents an algorithm for the low power implementation of the Discrete Cosine Transform on Single multiplier CMOS DSPs. The algorithm reduces power by a combination of using shift operations, where possible, and manipulating bit-correlation between suc- cessive cosine coefﬁcients applied to the input of the multiplier section such that the effective switched ca- pacitance is reduced. This reduces the switching activ- ity in the multiplication of a Discrete Cosine Trans- form processor. The paper describes the algotrithm, the evaluation procedure and presents results with a number of example images illustrating upto 50 power savings. 1. INTRODUCTION Currently there is considerable interest in the low power implementation of the Discrete Cosine Transform (DCT). This is mainly due to the DCT being the computational bottleneck of standards such as JPEG and MPEG [1]. Most research work considering low power implement- ation of the DCT have targeted reducing the compu- tational complexity of the design or modifying it for operation under a lower supply voltage [1, 2]. Both these techniques have a limited effect on power re- duction. Another major contribution to power con- sumption is due to the effective switched capacitance [3, 4]. Only a few researchers have targeted reducing power of a DCT implementation through a reduction in the amount of switched capacitance. This reduc- tion has been achieved through techniques such as the detection of zero-valued DCT coefﬁcients and lookup table partitioning [5]. This paper presents a technique for reducing power dissipation of the DCT by target- ing the multiplier section of a DCT processor. The pivot of this technique is a multiplication algorithm for the low power implementation of the DCT on CMOS based signal processing systems. The algorithm re- duces power consumption by reducing the effective switched capacitance of the multiplier through effect- ive manipulation of the multiplication process between the cosine and data matrices. Our results indicate that the effective capacitance can be reduced signiﬁcantly by performing the multiplications of the DCT in an order dictated by the amount of correlation between subsequent cosine matrix elements applied to one of the inputs of the multiplier section. In addition we propose a modiﬁed processor architecture that accom- modates this multiplication scheme and is open to ex- ploitation by other ordering algorithms. We demon- strate our scheme by an example of ordering the cosine coefﬁcients according to minimum hamming distance, revealing up to 50 power savings with a number of image examples. 2. IMPLEMENTATION The computational bottleneck for the implementation of the DCT is the multiplication of the cosine coefﬁ- cients matrix [E], by the pixel matrix [D], in order to obtain the DCT coefﬁcients [C], i.e. (1) Where each element in [C] matrix of order n is given by: (2) Traditionally, the multiplication process is performed in a row-by-column fashion [6,7]. See Equation(2). In I-149 0-7803-5474-5/99/$10.00(C)1999 IEEE