FPGA-BASED DIGIT-SERIAL COMPLEX NUMBER
MULTIPLIER-ACCUMULATOR
T. Sansaloni
1
, J. Valls
1
, K.K. Parhi
2
1
Dpto. Ing. Electrónica. Univ. Politécnica de Valencia, EPS Gandía.
Ctra. Nazaret-Oliva sn. 46730 Grao de Gandía, Spain.
2
Dept. of Electrical and Computer Engineering. Univ. of Minnesota.
200 Union Street SE. Minneapolis, MN55455, USA.
ABSTRACT
This paper presents a FPGA implementation of digit-serial
Complex Number Multiplier-Accumulators (CMACs) based on
Booth recoding techniques and Carry Save (CS) adders. The
Complex Number Multiplier-Accumulators can be pipelined at
LUT-level. An efficient mapping of the Booth recoding and the
partial product generation is presented which results in a logic
depth reduction. The combination of 5-3 and 4-3 converter in the
CS structure and the utilization of Ripple Carry Adder (RCA)
trees lead to a minimum area requirement.
1. INTRODUCTION
In the last decade Digital Signal Processing (DSP) applications
have shown an increase in the use of field programmable logic
devices (FPLDs). This trend will also continue in the future [1].
FIR and IIR filters, as well as FFT processors require efficient
implementation of building blocks like adders, multipliers and
complex-number multiplier-accumulators. Among the above-
mentioned elements, the CMAC is the most time, area and power
critical element.
In recent years, multiple structures using different number
representation have been reported. Distributed Arithmetic (DA)
[2], [3], [4], [5] requires constant coefficients and it reduces the
number of operation in the CMAC. Redundant representations
and Booth recoding [6] lead to carry-free addition/subtraction
and a reduction in partial products, respectively. However, it also
increases the required area of the implementation. A CMAC
using two’s complement number representations and a digit
serial architecture (digit-size d=2 and data-size w=4 bits) is
described in [7]. A multiplier using two’s complement number
representation and digit serial architecture of digit-sizes d=2, 4
and 8 was published in [8]. Nevertheless, all these architectures
have been implemented in VLSI. A systematic analysis of speed
and area of DA-based CMACs in FPGAs is published in [9].
However, the described architectures cannot be pipelined al bit
level.
This paper presents a modified architecture of CMACs in FPGAs
based on the ideas reported in [7] and [8]. Both them are digit-
serial systems [10], [11]. These result in a bit level pipelined
architecture leading to a very high sampling rate and a
simultaneous reduction in area. The proposed structures use
variable data operands of w=4, 8 and 16 bits and the digit-sizes
vary between d=2, 4 and 8 bits, respectively. All the structures
have been implemented in Xilinx FPGA devices.
The paper is organized as follows: the algorithm of complex
number multiplication is described in section 2. Section 3
describes the FPGA implementations of building blocks used in
the CMAC. The implementation results (area, throughput,
efficiency and latency) are presented in section 4. Finally the
conclusions are summarized.
2. ALGORITHM
The Complex Number Multiplier-Accumulator performs the
operation (1), where A, B, C are complex numbers.
A ⋅ B + C (1)
The use of the modified Booth recoding scheme in multipliers
reduces the number of partial products by a factor of two. The
multiple of the multiplicand is selected by the Booth encoder
digit, which is in the set of {-2X, -X, 0, X, 2X}. Therefore, all
partial products can be obtained by simple shift and/or invert
operations. A ‘1’ is added to the LSB when the operation is -2X
or -X to obtain the two’s-complement of the product. It is
performed by adding S
i
in a posterior step (figure 1).
The sign extension of the operand A has been done by using the
sign-generate technique [12].
A3 A2 A1 A0
* B3 B2 B1 B0
← ← 1
-P4
0
P3
0
P2
0
P1
0
P0
0
1
-P4
1
P3
1
P2
1
P1
1
P0
1
S
1
S
0
X6 X5 X4 X3 X2 X1 X0
Figure 1. Sign-generate algorithm.
3. FPGA IMPLEMENTATION OF THE
DIGIT-SERIAL CMACS
In this section the implementation details of the digit-serial
CMAC main blocks in Xilinx devices are presented. This
includes the proposed mapping for the partial product generators
with Booth recoding, the adders that perfom the addition of the
partial products and the chosen topology for the final carry
propagate adder. Additionally, the extension of the modified
Booth recoding algorithm to larger digit sizes and pipelining
techniques are also explained.
IV-585
0-7803-5482-6/99/$10.00 ©2000 IEEE
ISCAS 2000 - IEEE International Symposium on Circuits and Systems, May 28-31, 2000, Geneva, Switzerland