Abstract—In this paper, a new data access scheme for the computation of lifting 2-D DWT (Discrete Wavelet Transform) using systolic arrays with block processing is suggested. From DG (dependence graph) linear systolic array is directly derived. For parallel and pipeline implementation of 1-D DWT from suitably segmented DG is used for deriving 2-D systolic arrays. Above two systolic arrays are used as building blocks to derive the lifting 2-D DWT. The proposed architecture requires a small on-chip memory of (4N + 8P) where N is the image width, process a block of P samples in every cycle. Compared to existing structures it has high throughput, low latency and less computational complexity. The synthesis is performed in Xilinx 8.1i, Spartan 2E hardware with XC2S50E device and FT256 package and simulation results are obtained using Mat lab 7.10 and Modelsim 6.3f. The image size is 512 X 512 and block size is 4 with area is 987500.22 u.sqm, power consumed is 8.34027 mw and delay count is 16.11 ns Keywords—Block processing, Discrete Wavelet Transform (DWT), Lifting, Systolic VLSI, 2-D DWT. I. INTRODUCTION WO dimensional discrete wavelet transform (2-D DWT) has evolved as an effective and powerful tool in many applications especially in image processing and compression. This is mainly due to its better computational efficiency achieved by factoring wavelet transforms. Mainly two types of DWT structures are classified in to (i) convolution and (ii) lifting. Lifting scheme facilitates high speed and efficient implementation of wavelet transform and it is attractive for both high throughput and low power applications. Lifting requires less arithmetic and memory resources. Compared to convolution [1], [2] hardware components of 2-D DWT are broadly classified into arithmetic components and memory components. Arithmetic components have multipliers and memory components have transposition memory and temporal memory. Transposition memory used to store input/intermediate coefficients whereas temporal memory stores partial results of filter output. ThirumaraiSelvi Chandraraju is with the Electronics and Communication Engineering Department, Sri Krishna College of Engineering and Technology, Coimbatore-641008,Tamilnadu, INDIA (Phone: 9944192456; e- mail: selvichand@gmail.com). Sudhakar RadhaKrishnan, was with Electronics and Communication Engineering Department, Dr.Mahalingam College of Engineering and Technology,Pollachi,Tamilnadu,INDIA (e-mail:sudha_radha2000@yahoo.co.in). Parallel data access scheme Cheng et al [5] in which the size of transposition memory is reduced and temporal memory remains independent of data access scheme and input block size. Therefore, in 2-D DWT structures the on-chip memory is based on parallel data access scheme is dominated by temporal memory. The line based structure in [4] requires temporal memory of size 3N to process the 4 samples per cycle and parallel scanning lifting scheme [6] involves same size of temporal memory as in line based. The proposed systolic arrays block processing system is used to utilize temporal memory to reduce area-time complexity of 2-D DWT structure. The block based methods of parallel and pipelined architecture are used in the implementation of 2-D DWT [7], [8]. Both these structures have same throughput rate and same arithmetic resources but different sizes of transposition memory is varied according to the size of input data matrix. Mohanty et al [7] obtained data blocks by folding rows, size of temporal memory is 3N and transposition memory is 2.5N for 1 level 2-D DWT. Tian [8] derived the data blocks from P- rows parallel data access, transposition memory size is [N (P + 2)/2] and temporal memory size 3N, P is the block size. Structure [8] requires transposition memory to buffer the intermediate blocks and the processing of blocks is different order than the input data matrix. Transposition memory size depends on block size as well as the on the image size. On chip memory in [8] depends on block size and for block size>=4, on chip memory is independent of block size in [7] and has less block size compared to [8]. This paper suggest a data access scheme of suitably partitioning and mapping of appropriate computation of hardware architecture to derive the memory and area-power efficient block based 2-D DWT structure. II. EXISTING WORK A modular and pipeline architecture of lifting based multilevel 2-D DWT [7] structure provides appropriate partitioning and scheduling is performed at each decomposition levels. The different levels at which the processing is performed using cascaded pipeline architecture. The proposed structure uses pyramid algorithm and one recursive pyramid algorithm. Then the entire processing is based on unit input block size. It has local register and RAM for storage of data instead of buffers which processes image of size 512 X 512.The main drawback of this method is large size on-chip memory which requires more area and power for Design of 2-D DWT VLSI Architecture for Image Compression ThirumaraiSelvi Chandraraju, and Sudhakar RadhaKrishnan T Int'l Conference on Advanced Computational Technologies & Creative Media (ICACTCM’2014) Aug. 14-15, 2014 Pattaya (Thailand) http://dx.doi.org/10.15242/IIE.E0814532 28