IEEE International Symposium on Circuits and Systems, Volume II, pp. 785-788, May 2004, Vancouver, Canada GIGAHERTZ-RANGE MCML MULTIPLIER ARCHITECTURES Venkat Srinivasan, Dong Sam Ha, and Jos B Sulistyo VTVT (Virginia Tech VLSI for Telecommunications) Laboratory, Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA E-mail: {vsriniva, ha, jsulisty}@vt.edu http://www.ee.vt.edu/~ha/ ABSTRACT In this paper, we present three digital multiplier architectures capable of operating in the gigahertz range, based on MOS Current Mode Logic (MCML) style. A small library of MCML logic gates consisting of NAND/AND, XOR/XNOR, (3x2) counter (full adder), [4:2] compressor, and master-slave flip-flop were designed and optimized for high-speed operation. Using these gates, we propose three different 8-bit MCML binary-tree multiplier architectures and compare their performance in terms of latency, throughput (number of multiplications per second) and power consumption. According to our simulation, the fastest multiplier targeting for TSMC 0.18 µm CMOS technology attains a throughput of 4.76 GHz or 4.76 Billion multiplications per second and a latency of 3.8 ns. 1. INTRODUCTION The increasing demand for fast arithmetic units in floating point co-processors, graphic processing units and DSP chips has shaped the need for highly integrated, high-speed multipliers. Traditionally multiplier architectures fall in into one of the following two categories, viz. array multipliers and tree multipliers. The latency of array multipliers is a linear function of the word length of the multiplier, O(n), whereas in the case of tree multipliers, the latency is a logarithmic function of the word length, O[log(n)]. Hence, tree structures require fewer numbers of stages for partial product reduction compared to array structures and are more suitable for high-speed multiplier designs. To enhance the throughput, we pipelined our multipliers by inserting a register stage after every compressor cell. The ability to build logic gates that operate at a high speed, while dissipating relatively small power, makes MOS current mode logic (MCML) a promising technique for designing gigahertz-range arithmetic circuits. Our high-speed pipelined tree multipliers exploit several attractive features of (MCML) as described later. A small library of MCML logic gates consisting of NAND/AND, XOR/XNOR, 3x2 Counter (Full Adder), [4:2] Compressor and Flip-flop form the core components of our multipliers, and they were designed and optimized for high- speed operation. We propose three 8-bit MCML multiplier architectures, a 3-2 tree architecture with a ripple carry adder, a 4- tree architecture with a ripple carry adder, and a 4-2-tree architecture with a carry look-ahead adder. Section 2 covers basics of MCML. In this section, we also discuss various MCML design metrics and tradeoffs involved in MCML gate design. Section 3 describes design of various MCML gates for our library, discusses optimization techniques adopted for the design and also provides simulation results. In Section 4, we present our three 8-bit MCML multiplier architectures. In section 5, we compare the performance of the proposed multiplier architectures and present simulation results. Finally, Section 6 summarizes our research. 2. MOS CURRENT MODE LOGIC (MCML) The operation of an MCML gate may be understood with the help of a basic structure of an MCML gate, shown in Figure 1 [1]. It consists of a load resistors R L , a differential pull-down network (PDN) with complementary sets of inputs and outputs, and a constant current source I CS . R L R L Out Out In 1 In 1 In N In N I CS PDN Figure 1: Basic Structure of an MCML Gate The differential inputs (complementary sets) are applied to the pull down network (PDN). The PDN has a tree-like differential structure, similar to a Differential Cascode Voltage Switch (DCVS) family [2]. The output and its complement are available at the two arms as indicated in the figure. The PDN is grounded through a constant current source I CS , which is usually an NMOS transistor. The voltage swing at the output and its complement is ∆V = I CS R L and is controlled by setting the value of the current source I CS and the effective value of R L , which is usually a PMOS transistor. The voltage swing is in the range of 785