Half V
DD
Clock-Swing Flip-Flop with Reduced
Contention for up to 60% Power Saving in Clock
Distribution
David Levacq
1
, Muhammad Yazid
2
, Hiroshi Kawaguchi
3
, Makoto Takamiya
1
, Takayasu Sakurai
1
1
levacq@iis.u-tokyo.ac.jp
Center for Collaborative Research
University of Tokyo
Tokyo, Japan
2
Fujitsu Microsolutions
Kanagawa, Japan
3
Kobe University
Kobe, Japan
Abstract— A new low clock swing flip-flop (F/F) is proposed.
The existing low clock-swing F/F’s consume high power,
introduce speed penalty due to contention currents or require
large silicon area due to separate well for substrate biasing. By
reducing contention currents, our proposal efficiently mitigates
those issues. Measurements and simulations are carried out
based on a 90 nm CMOS process, demonstrating reductions of
active power by 71%, area by 36% and delay by 35% compared
to previous proposals. It is shown that the combination of a low-
clock swing distribution tree with the new F/F can save up to
60% of the total clock system power.
I. INTRODUCTION
Reduction of power consumption of VLSI circuits is a
growing concern. A direct solution is to reduce the system
supply voltage V
DD
. However, this can only be done at the
expense of speed degradation, which can be unacceptable in
high-performance systems. Another solution is to reduce the
clock voltage swing without reducing V
DD
. This has been
shown to be an efficient approach to reduce power dissipation
because clock distribution is a major contributor to the power
dissipation in VLSI circuits (20 to 45% of the total chip
power)[1]. If V
DD
for logic circuits is kept high, this technique
has little impact on speed, at the condition to have flip-flops
(F/F) that can operate efficiently under reduced V
CK
.
A first simple idea is to insert low-to-high level converters
in front of conventional F/Fs to regenerate a full clock signal
[2][3] (Low-to-High converter D-F/F (LHDFF), Fig. 1a).
Theoretically, neglecting clock skew issues, this technique
doesn’t have any impact on the system performance since the
logic critical path is not affected. However, it doesn’t translate
in large power savings since voltage swings are reduced on the
clock-tree distribution lines only while the high number of
low-to-high level converters consumes considerable power.
Therefore, a more efficient approach would be to implement
F/Fs that can directly receive a reduced swing clock. In [4],
two separate half-swing clock signals are distributed across
the chip: the first swinging from zero to half V
DD
to control
NMOSFETs, the second swinging from half V
DD
to V
DD
to
control PMOSFETs. While this technique has little impact on
speed, the requirement to distribute two clock signals presents
some difficulties regarding routing and skew adjustment.
The previously proposed Reduced Clock Swing F/F [1]
(RCSFF, Fig. 1b) requires only one clock signal swinging
between 0 and a low voltage V
CK
<V
DD
. However, it is clear
from Fig. 1b that when clock signal is high, the clocked
PMOSFETs cannot be efficiently turned off, resulting in a
direct current path from V
DD
to ground and high power
dissipation. This difficulty can be partially circumvented by
connecting the n-well of clocked PMOSFETs to a high
voltage bias to increase their threshold voltage (V
th
) and
thereby reduce the leakage. But this requires to layout those
transistors in a separate well and to generate and distribute a
voltage bias above the standard V
DD,
which complicates the
design and increases the circuit’s area. Moreover, the voltage
bias that can be applied to the separate well is limited by
reliability constraints. Last but not least, the precharge-
discharge cycles inside the F/F result in unnecessary power
dissipation when input signal is kept constant over several
clock cycles. The high power consumption of RCSFF
therefore strongly reduce the benefits of low clock swing to
reduce the chip power dissipation.
The NAND-type Keeper F/F [5](NDKFF, Fig. 1c) doesn’t
require separate well and eliminates unnecessary signal
transitions inside the F/F for constant input. However, two
internal nodes are subject to contention. When QQ node is low
while D is high, there is a fighting (contention) between the
ON NMOSFETs pulldown network and the ON PMOSFET
(in bold in Fig. 1c) to discharge node X at the clock rising
edge. Similarly, there is a fight against the positive feedback
of the latch formed by inverters I6,7 to change the state of
node QQ. Our simulations show that the required sizing to
guarantee functionality across all process corners despite those
contentions results in suboptimal speed performance.
1-4244-1125-4/07/$25.00 ©2007 IEEE. 190