Half V DD Clock-Swing Flip-Flop with Reduced Contention for up to 60% Power Saving in Clock Distribution David Levacq 1 , Muhammad Yazid 2 , Hiroshi Kawaguchi 3 , Makoto Takamiya 1 , Takayasu Sakurai 1 1 levacq@iis.u-tokyo.ac.jp Center for Collaborative Research University of Tokyo Tokyo, Japan 2 Fujitsu Microsolutions Kanagawa, Japan 3 Kobe University Kobe, Japan AbstractA new low clock swing flip-flop (F/F) is proposed. The existing low clock-swing F/F’s consume high power, introduce speed penalty due to contention currents or require large silicon area due to separate well for substrate biasing. By reducing contention currents, our proposal efficiently mitigates those issues. Measurements and simulations are carried out based on a 90 nm CMOS process, demonstrating reductions of active power by 71%, area by 36% and delay by 35% compared to previous proposals. It is shown that the combination of a low- clock swing distribution tree with the new F/F can save up to 60% of the total clock system power. I. INTRODUCTION Reduction of power consumption of VLSI circuits is a growing concern. A direct solution is to reduce the system supply voltage V DD . However, this can only be done at the expense of speed degradation, which can be unacceptable in high-performance systems. Another solution is to reduce the clock voltage swing without reducing V DD . This has been shown to be an efficient approach to reduce power dissipation because clock distribution is a major contributor to the power dissipation in VLSI circuits (20 to 45% of the total chip power)[1]. If V DD for logic circuits is kept high, this technique has little impact on speed, at the condition to have flip-flops (F/F) that can operate efficiently under reduced V CK . A first simple idea is to insert low-to-high level converters in front of conventional F/Fs to regenerate a full clock signal [2][3] (Low-to-High converter D-F/F (LHDFF), Fig. 1a). Theoretically, neglecting clock skew issues, this technique doesn’t have any impact on the system performance since the logic critical path is not affected. However, it doesn’t translate in large power savings since voltage swings are reduced on the clock-tree distribution lines only while the high number of low-to-high level converters consumes considerable power. Therefore, a more efficient approach would be to implement F/Fs that can directly receive a reduced swing clock. In [4], two separate half-swing clock signals are distributed across the chip: the first swinging from zero to half V DD to control NMOSFETs, the second swinging from half V DD to V DD to control PMOSFETs. While this technique has little impact on speed, the requirement to distribute two clock signals presents some difficulties regarding routing and skew adjustment. The previously proposed Reduced Clock Swing F/F [1] (RCSFF, Fig. 1b) requires only one clock signal swinging between 0 and a low voltage V CK <V DD . However, it is clear from Fig. 1b that when clock signal is high, the clocked PMOSFETs cannot be efficiently turned off, resulting in a direct current path from V DD to ground and high power dissipation. This difficulty can be partially circumvented by connecting the n-well of clocked PMOSFETs to a high voltage bias to increase their threshold voltage (V th ) and thereby reduce the leakage. But this requires to layout those transistors in a separate well and to generate and distribute a voltage bias above the standard V DD, which complicates the design and increases the circuit’s area. Moreover, the voltage bias that can be applied to the separate well is limited by reliability constraints. Last but not least, the precharge- discharge cycles inside the F/F result in unnecessary power dissipation when input signal is kept constant over several clock cycles. The high power consumption of RCSFF therefore strongly reduce the benefits of low clock swing to reduce the chip power dissipation. The NAND-type Keeper F/F [5](NDKFF, Fig. 1c) doesn’t require separate well and eliminates unnecessary signal transitions inside the F/F for constant input. However, two internal nodes are subject to contention. When QQ node is low while D is high, there is a fighting (contention) between the ON NMOSFETs pulldown network and the ON PMOSFET (in bold in Fig. 1c) to discharge node X at the clock rising edge. Similarly, there is a fight against the positive feedback of the latch formed by inverters I6,7 to change the state of node QQ. Our simulations show that the required sizing to guarantee functionality across all process corners despite those contentions results in suboptimal speed performance. 1-4244-1125-4/07/$25.00 ©2007 IEEE. 190