IMPACT-2009 A High-Speed, Hierarchical 16×16 Array of Array Multiplier Design Abhijit Asati 1 and Chandrashekhar 2 1 EEE Group, BITS, Pilani, India, abhijitmicro@gmail.com 2 CEERI, Pilani, India, chandra@ceeri.ernet.in Abstract—Array multipliers are preferred for smaller operand sizes due to their simpler VLSI implementation, in-spite of their linear time complexity. The tree multipliers have time complexity of O (log n) but are less suitable for VLSI implementation since, being less regular, they require larger total routing length, which may degrade their performance. Some hybrid architectures called ‘array of array’ multipliers have intermediate performance. These multipliers have a time complexity better than array multipliers, and therefore becomes an obvious choice for higher performance multiplier designs of moderate operand sizes. In this paper a 16×16 unsigned ‘array of array’ multiplier circuit is designed with hierarchical structure and implemented using conventional CMOS logic in 0.6μm, N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS. The proposed multiplier implementation shows large reduction in propagation delay and the average power consumption (at 20MHz) as compared to 16-bit Booth encoded Wallace tree multiplier by F Jalil [3]. The total transistor count, maximum instantaneous power, leakage power, core area, total routing length and number of vias are also presented. I. INTRODUCTION The multiplier is a fundamental building block in Standard Digital Signal Processors and ASIC Digital Signal Processors used for Digital Signal Processing. Multiplication process is used in many Neural computing and DSP applications like instrumentation and measurement, communications, audio and video processing, Graphics, image enhancement, 3-D rendering, Navigation, radar, GPS, and control applications like robotics, machine vision, guidance. It is mainly used to implement algorithms like frequency domain filtering (FIR and IIR), frequency-time transformations (FFT), Correlation etc. Most DSP tasks require real-time processing; it must perform these tasks speedily while minimizing Cost and Power. The multiplication algorithms differ in the means of ‘partial product generation’ and ‘partial product addition [1]. The array multiplier has linear time complexity i.e O (n) therefore delay degrades for multipliers having larger operand sizes. Also it has poor space complexity O (n 2 ), as it requires approximately n 2 cells to produce multiplication. Therefore as the operand size grows, the circuit takes larger area and power [2], [5], [6]. A radix-m booth encoding, where m=2 n reduces the partial product rows by factor of n. Booth radix-4 (m=4=2 2 ) encoding can reduce the number of partial product rows by a factor of two [3]. Since the numbers of partial product rows is reduced to half, the hardware required to generate partial products is reduced to n 2 /2 cells [2]. In Wallace tree multipliers, since ripple effect is reduced they produce products in far less time. The time complexity is reduced to O (log n) but larger routing area is required as compared to regular array multipliers making them less suitable for VLSI implementation [2]. The advantage of reduction in hardware using Booth encoding scheme can be combined with accelerated Wallace tree accumulation of partial product to obtain the reduced time complexity of O (log n), which are very much suitable for large operand size multipliers [2], [3]. In sub-micron/deep sub-micron era for the multipliers of moderate operand sizes, where tree based architectures may degrade their performance due to larger routing lengths some hybrid architectures shows better performance, since gate level analysis of these architectures shows moderate area and delay performance. These multiplier architectures have moderate area requirements and time complexity of ) ( N O [4]. In this paper we present a hierarchical implementation of 16×16, multiplier design using array of array technique. The VLSI implementation of multiplier circuit is done using 0.6μm, N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS, using conventional CMOS logic. Simulation results are compared with Booth encoded Wallace tree multiplier of [3]. Section II explains the design of a 2×2 multiplier, Section III describes hierarchical design of a 4×4 multiplier; Section IV describes hierarchical design of 8×8 multiplier and 16×16 multiplier. Physical implementation and results are described in section V. Section VI concludes the paper. II. DESIGN OF A 2×2 MULTIPLIER In this architecture the 2×2 unsigned multiplier is used as a basic building block in a hierarchical design of a larger bit size multiplier. The truth table for a 2×2 combinational multiplier is shown in table I. The truth table can be solved using K- 978-1-4244-3604-0/09/$25.00 ©2009 IEEE 161