IMPACT-2009
A High-Speed, Hierarchical 16×16 Array of Array
Multiplier Design
Abhijit Asati
1
and Chandrashekhar
2
1
EEE Group, BITS, Pilani, India, abhijitmicro@gmail.com
2
CEERI, Pilani, India, chandra@ceeri.ernet.in
Abstract—Array multipliers are preferred for smaller operand
sizes due to their simpler VLSI implementation, in-spite of their
linear time complexity. The tree multipliers have time
complexity of O (log n) but are less suitable for VLSI
implementation since, being less regular, they require larger
total routing length, which may degrade their performance.
Some hybrid architectures called ‘array of array’ multipliers
have intermediate performance. These multipliers have a time
complexity better than array multipliers, and therefore becomes
an obvious choice for higher performance multiplier designs of
moderate operand sizes. In this paper a 16×16 unsigned ‘array
of array’ multiplier circuit is designed with hierarchical
structure and implemented using conventional CMOS logic in
0.6μm, N-well CMOS process (SCN_SUBM, lambda=0.3) of
MOSIS. The proposed multiplier implementation shows large
reduction in propagation delay and the average power
consumption (at 20MHz) as compared to 16-bit Booth encoded
Wallace tree multiplier by F Jalil [3]. The total transistor count,
maximum instantaneous power, leakage power, core area, total
routing length and number of vias are also presented.
I. INTRODUCTION
The multiplier is a fundamental building block in Standard
Digital Signal Processors and ASIC Digital Signal Processors
used for Digital Signal Processing. Multiplication process is
used in many Neural computing and DSP applications like
instrumentation and measurement, communications, audio and
video processing, Graphics, image enhancement, 3-D
rendering, Navigation, radar, GPS, and control applications
like robotics, machine vision, guidance. It is mainly used to
implement algorithms like frequency domain filtering (FIR
and IIR), frequency-time transformations (FFT), Correlation
etc. Most DSP tasks require real-time processing; it must
perform these tasks speedily while minimizing Cost and
Power. The multiplication algorithms differ in the means of
‘partial product generation’ and ‘partial product addition [1].
The array multiplier has linear time complexity i.e O (n)
therefore delay degrades for multipliers having larger operand
sizes. Also it has poor space complexity O (n
2
), as it requires
approximately n
2
cells to produce multiplication. Therefore as
the operand size grows, the circuit takes larger area and power
[2], [5], [6]. A radix-m booth encoding, where m=2
n
reduces
the partial product rows by factor of n. Booth radix-4
(m=4=2
2
) encoding can reduce the number of partial product
rows by a factor of two [3]. Since the numbers of partial
product rows is reduced to half, the hardware required to
generate partial products is reduced to n
2
/2 cells [2]. In
Wallace tree multipliers, since ripple effect is reduced they
produce products in far less time. The time complexity is
reduced to O (log n) but larger routing area is required as
compared to regular array multipliers making them less
suitable for VLSI implementation [2]. The advantage of
reduction in hardware using Booth encoding scheme can be
combined with accelerated Wallace tree accumulation of
partial product to obtain the reduced time complexity of O
(log n), which are very much suitable for large operand size
multipliers [2], [3]. In sub-micron/deep sub-micron era for the
multipliers of moderate operand sizes, where tree based
architectures may degrade their performance due to larger
routing lengths some hybrid architectures shows better
performance, since gate level analysis of these architectures
shows moderate area and delay performance. These multiplier
architectures have moderate area requirements and time
complexity of ) ( N O [4]. In this paper we present a
hierarchical implementation of 16×16, multiplier design using
array of array technique. The VLSI implementation of
multiplier circuit is done using 0.6μm, N-well CMOS process
(SCN_SUBM, lambda=0.3) of MOSIS, using conventional
CMOS logic. Simulation results are compared with Booth
encoded Wallace tree multiplier of [3]. Section II explains the
design of a 2×2 multiplier, Section III describes hierarchical
design of a 4×4 multiplier; Section IV describes hierarchical
design of 8×8 multiplier and 16×16 multiplier. Physical
implementation and results are described in section V. Section
VI concludes the paper.
II. DESIGN OF A 2×2 MULTIPLIER
In this architecture the 2×2 unsigned multiplier is used as a
basic building block in a hierarchical design of a larger bit size
multiplier. The truth table for a 2×2 combinational multiplier
is shown in table I. The truth table can be solved using K-
978-1-4244-3604-0/09/$25.00 ©2009 IEEE 161