IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 121 High Performance Low Power Array Multiplier Using Temporal Tiling Shivaling S. Mahant-Shetti, Poras T. Balsara, Senior Member, IEEE, and Carl Lemonds, Member, IEEE Abstract—Digital multipliers are a major source power dissipa- tion in digital signal processors. Array architecture is a popular technique to implement these multipliers due to its regular compact structure. High power dissipation in these structures is mainly due to the switching of a large number of gates during multiplication. In addition, much power is also dissipated due to a large number of spurious transitions on internal nodes. Timing analysis of a full adder, which is a basic building block in array multipliers, has resulted in a different array connection pattern that reduces power dissipation due to the spurious transition activity. Furthermore, this connection pattern also improves the multiplier throughput. This array pattern is based on creating a compact tiled structure, wherein the shape of a tile represents the delay through that tile. That is, a compact structure created using these tiles is nothing but a structure with high throughput. Such a temporal tiling technique can also be applied to other digital circuits. Based on our simulation studies, a temporally tiled array multiplier achieves 50% and 35% improvements in delay and power dissipation compared to a conventional array multiplier. Improvement in delay can be traded for power using voltage reduction techniques. Index Terms— Array multiplier, booth encoding, low power, temporal tiling. I. INTRODUCTION T HE multiplier circuit is a core component of most of the present day digital signal processors (DSP’s). Study of power dissipation in DSP’s indicate that multipliers are one of the most power hungry components on these chips. The array multiplier is one of the most popular architectures due to its simple and regular interconnect. However, recent research on signal transition activity indicated that array multipliers have an architectural disadvantage [1]. This is mainly due to nonuniform path delays in the structure, which results in multiple signal transitions on internal nodes before they settle to a ﬁnal value. These multiple transitions are spurious or redundant and, consequently, dissipate unnecessary power. In fact, in a recent study of an array multiplier, almost 50% of the power was shown to be due to these spurious transitions [2]. In the past, improvements in power of array multipliers have been obtained as a result of bottom-up analysis. Given an array topology, the characteristics of the constituents or its structure can be modiﬁed to yield lower power dissipation. Spurious transitions can be reduced by equalizing path delays from inputs to outputs using latches and/or self-timed circuit techniques using replicated circuit blocks, as proposed by Lemonds et al. [2]. Use of inverters instead of replication of the Manuscript received May 22, 1997; revised February 27, 1998. S. S. Mahant-Shetti and C. Lemonds are with the DSPS R&D Center, Texas Instruments Incorporated, Dallas, TX 75265 USA. P. T. Balsara is with the Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75083 USA. Publisher Item Identiﬁer S 1063-8210(99)00705-2. Fig. 1. Schematic diagram of the full adder. blocks was also shown to be useful by Ko et al. [3]. Further improvements can be made by judiciously increasing delay of sub blocks, i.e., delay balancing by adding new elements as demonstrated by Sakuta et al. [4]. All of these methods relied on providing signals just when they were needed in order to avoid unnecessary transitions. This was achieved by introducing additional logic, i.e., an area and power penalty. Lerouge et al. [5] proposed a method for improving the speed of array multipliers by equalizing delays among the carry and sum paths. This was done by rearranging array multiplier cells into three groups, which worked in parallel to produce the sum outputs almost at the same time as carry outputs. In this paper, we invert the process. That is, instead of delay balancing by modifying components or by introducing delay elements, we use the existing components with delay imbalances and create an overall delay-balanced structure. We start by analyzing an efﬁcient full adder that can be used to constitute a multiplier and derive an array topology for the multiplier that reduces waits between signals at various intermediate stages. The resulting structure is also an arrayed design. Since the interconnect structure of this multiplier skips rows, we call this a “leapfrog” multiplier. II. ADDER ANALYSIS The array multiplier consists of Booth elements, which sum a previous partial sum with a term derived from the multiplicand based on two bits of the multiplier. The Booth element in turn consists of a multiplexor and a full adder. In this section, we will look at the adder in some detail. We chose the two stage adder conﬁguration consisting of two exclusive-OR (XOR) or exclusive-NOR (XNOR) gates and a multiplexor since it can be realized compactly. Fig. 1 shows the schematic diagram of the full-adder circuit used in the 1063–8210/99$10.00  1999 IEEE