FPGA-BASED DIGIT-SERIAL COMPLEX NUMBER MULTIPLIER-ACCUMULATOR T. Sansaloni 1 , J. Valls 1 , K.K. Parhi 2 1 Dpto. Ing. Electrónica. Univ. Politécnica de Valencia, EPS Gandía. Ctra. Nazaret-Oliva sn. 46730 Grao de Gandía, Spain. 2 Dept. of Electrical and Computer Engineering. Univ. of Minnesota. 200 Union Street SE. Minneapolis, MN55455, USA. ABSTRACT This paper presents a FPGA implementation of digit-serial Complex Number Multiplier-Accumulators (CMACs) based on Booth recoding techniques and Carry Save (CS) adders. The Complex Number Multiplier-Accumulators can be pipelined at LUT-level. An efficient mapping of the Booth recoding and the partial product generation is presented which results in a logic depth reduction. The combination of 5-3 and 4-3 converter in the CS structure and the utilization of Ripple Carry Adder (RCA) trees lead to a minimum area requirement. 1. INTRODUCTION In the last decade Digital Signal Processing (DSP) applications have shown an increase in the use of field programmable logic devices (FPLDs). This trend will also continue in the future [1]. FIR and IIR filters, as well as FFT processors require efficient implementation of building blocks like adders, multipliers and complex-number multiplier-accumulators. Among the above- mentioned elements, the CMAC is the most time, area and power critical element. In recent years, multiple structures using different number representation have been reported. Distributed Arithmetic (DA) [2], [3], [4], [5] requires constant coefficients and it reduces the number of operation in the CMAC. Redundant representations and Booth recoding [6] lead to carry-free addition/subtraction and a reduction in partial products, respectively. However, it also increases the required area of the implementation. A CMAC using two’s complement number representations and a digit serial architecture (digit-size d=2 and data-size w=4 bits) is described in [7]. A multiplier using two’s complement number representation and digit serial architecture of digit-sizes d=2, 4 and 8 was published in [8]. Nevertheless, all these architectures have been implemented in VLSI. A systematic analysis of speed and area of DA-based CMACs in FPGAs is published in [9]. However, the described architectures cannot be pipelined al bit level. This paper presents a modified architecture of CMACs in FPGAs based on the ideas reported in [7] and [8]. Both them are digit- serial systems [10], [11]. These result in a bit level pipelined architecture leading to a very high sampling rate and a simultaneous reduction in area. The proposed structures use variable data operands of w=4, 8 and 16 bits and the digit-sizes vary between d=2, 4 and 8 bits, respectively. All the structures have been implemented in Xilinx FPGA devices. The paper is organized as follows: the algorithm of complex number multiplication is described in section 2. Section 3 describes the FPGA implementations of building blocks used in the CMAC. The implementation results (area, throughput, efficiency and latency) are presented in section 4. Finally the conclusions are summarized. 2. ALGORITHM The Complex Number Multiplier-Accumulator performs the operation (1), where A, B, C are complex numbers. A B + C (1) The use of the modified Booth recoding scheme in multipliers reduces the number of partial products by a factor of two. The multiple of the multiplicand is selected by the Booth encoder digit, which is in the set of {-2X, -X, 0, X, 2X}. Therefore, all partial products can be obtained by simple shift and/or invert operations. A ‘1’ is added to the LSB when the operation is -2X or -X to obtain the two’s-complement of the product. It is performed by adding S i in a posterior step (figure 1). The sign extension of the operand A has been done by using the sign-generate technique [12]. A3 A2 A1 A0 * B3 B2 B1 B0 1 -P4 0 P3 0 P2 0 P1 0 P0 0 1 -P4 1 P3 1 P2 1 P1 1 P0 1 S 1 S 0 X6 X5 X4 X3 X2 X1 X0 Figure 1. Sign-generate algorithm. 3. FPGA IMPLEMENTATION OF THE DIGIT-SERIAL CMACS In this section the implementation details of the digit-serial CMAC main blocks in Xilinx devices are presented. This includes the proposed mapping for the partial product generators with Booth recoding, the adders that perfom the addition of the partial products and the chosen topology for the final carry propagate adder. Additionally, the extension of the modified Booth recoding algorithm to larger digit sizes and pipelining techniques are also explained. IV-585 0-7803-5482-6/99/$10.00 ©2000 IEEE ISCAS 2000 - IEEE International Symposium on Circuits and Systems, May 28-31, 2000, Geneva, Switzerland