EFFICIENT IMPLEMENTATIONS OF WAVELET TRANSFORMS – A ROADMAP Y. Andreopoulos 1 , P. Schelkens 1 , T. Stouraitis 2 , J. Cornelis 1 1 Vrije Universiteit Brussel/IMEC Dept. ETRO Pleinlaan 2 - B-1050 Brussel - Belgium {yandreop,pschelke,jpcornel}@etro.vub.ac.be 2 University of Patras - Dept. ECE VLSI Design Laboratory Rio 26500 - Greece thanos@ee.upatras.gr Abstract: The two major implementation methods for the discrete, two-dimensional binary-tree wavelet decomposition are presented. They are proposed in the context of efficient coupling with coding algorithms of compression standards, namely JPEG-2000 and MPEG-4. When implemented in software or hardware systems, they are capable of producing in real-time the binary-tree decomposition of the entire input image with a higher sample-rate. This is achieved by dividing and localizing the processing into small blocks of data. These blocks can efficiently be handled by a cache hierarchy in a programmable processor or by a custom-hardware design. Keywords: Low-Memory/High Bandwidth Multimedia Implementations, Local Wavelet Transform, Line- Based Wavelet Transform, Recursive Pyramid Algorithm. 1 INTRODUCTION AND BACKGROUND Wavelets have been introduced as a highly-efficient and flexible method for subband decomposition of signals [1]. In digital signal processing, good algorithmic performance, especially in the field of image compression [2] [3] [10], has been demonstrated. However, the implementation aspects of the wavelet decomposition have received little attention until recently. It was the inclusion of wavelets into the multimedia compression standards of MPEG-4 [4] and JPEG-2000 [5] that triggered the ongoing efforts to improve the implementation aspects of the transform. Focusing on subband decompositions of digital signals, the 1D multilevel binary-tree wavelet decomposition can be depicted as in Figure 1. The input signal is submitted to a convolution with kernels h o and h 1 and the output is subsampled by two. The reader is referred to many excellent texts for an introduction and specific properties of such subband decompositions [1] [6] [7] [8] [9]. The classical method to extend the decomposition of Figure 1 to two dimensions (images) is the separate application of the transform to the rows and columns of the input 2D signal. This provided the first algorithm for the 2D decomposition, which, from the implementation point of view, is called the Row-Column Wavelet Transform (RCWT). This is the classical implementation used in a variety of image compression systems [2] [3] [10]. The first implementation improvements were imposed by the replacement of the convolution-based filtering–and–subsampling with a decomposition based on “lifting steps” [9], which led to a reduction of the arithmetic operations, since the intermediate results of the filtering are reused (lifted) so as to produce the subband coefficients [9]. Figure 1. One-dimensional binary-tree multilevel decomposition of the input signal f. However, today it is acknowledged by various researchers [11] [12] [13] that the wavelet transform, as used for multimedia applications, is a data- dominated application that requires high data-access rates. Thus the main effort of the wavelet transform- production techniques that will be surveyed in this paper is on the localization of the processing in small memory components that can be designed to achieve a higher efficiency in an actual implementation and not in the reduction of the arithmetic operations.