Variable-Latency Design by Function Speculation
D. Ba˜ neres
Universitat Oberta de Catalunya
Barcelona, Spain
J. Cortadella
Universitat Polit` ecnica de Catalunya
Barcelona, Spain
M. Kishinevsky
Strategic CAD Lab, Intel Corp.
Hillsboro, OR USA
Abstract—Variable-latency designs may improve the performance of
those circuits in which the worst-case delay paths are infrequently acti-
vated. Telescopic units emerged as a scheme to automatically synthesize
variable-latency circuits. In this paper, a novel approach is proposed that
brings three main contributions with regard to the methods used for
telescopic units: first, no multi-cycle timing analysis is required to ensure
the correctness of the circuit; second, the method can be applied to large
circuits; third, the circuit can be optimized for the most frequent input
patterns. The approach is based on finding approximations of critical
nodes in the netlist that substitute the exact behavior. Two cycles are
required when the approximations are not correct. These approximations
can be obtained by the simulation of traces applied to the circuit.
Experimental results on selected examples show a tangible speed-up
(15%) with a small area overhead (3%).
I. I NTRODUCTION
The performance optimization of combinational circuits is usually
accomplished by reducing the delay of the critical paths after an
accurate timing analysis. This reduction is achieved by applying
different transformations on the circuit, such as logic restructuring
or gate sizing, that usually result in area and power overheads.
There is an interesting property that can be exploited in many
designs: the critical paths are infrequently activated. Instead of
defining the cycle time by the worst-case delay, a shorter cycle time
that covers a significant amount of input stimuli can be chosen. The
worst-case operations may not be accommodated in this clock period
and, therefore, more cycles may be required to complete them. These
variable-latency circuits are commonly used in long data-paths, such
as arithmetic circuits [1], to improve the performance.
Telescopic units [2]–[4] is a paradigm to automatically build
variable-latency circuits. An error detection function, referred as hold
function in [3], is computed to inform the environment at which cycle
the correct result is available at the outputs. This function externally
controls the clock period by holding the values on the registers [5]
or by adapting the clock frequency [6]. The variable-latency circuit
can be also used in an elastic design [7].
A combinational block is constructed to compute the error detec-
tion function that identifies those input patterns that require more
than one cycle to complete the execution. The function is not always
exact. A high computational cost is usually required to synthesize the
exact function that covers all these input patterns. The complexity
is equivalent to solving the false path problem, which is NP-
complete [8]. The calculation of the error detection function requires
an individual analysis of each input pattern. The proposed methods
usually resort to symbolic methods (e.g. BDDs) that simultaneously
analyze all input patterns. However, these strategies are often limited
to small or medium-size circuits.
Finally, the synthesis of telescopic units in conventional design
flows requires the definition of multi-cycle constraints that compli-
cates the design and validation flows. Moreover, they are not easy to
represent, since they are exercised by a complex set of input patterns.
This paper proposes an alternative and practical method to syn-
thesize variable-latency units. The critical paths are substituted by
non-critical signals that approximate their functionality. The error
detection function checks the correctness of each approximation with
regard the substituted signal. If the error detection function is not
activated, the cycle time is reduced due to the utilization of faster
signals in the original critical paths. If the error detection function is
activated, the exact value of the substituted signal is supplied in the
next cycle to amend the error.
A similar technique has been previously applied to some arithmetic
circuits [9]–[11]. An adder is a circuit with a long critical path (the
carry signal) that can be easily approximated with near-zero effect on
the correctness of the result. In this paper, we generalize the technique
for the automatic synthesis of any circuit.
The rest of the paper is organized as follows. Section II gives
an overview on telescopic units and presents the main contributions
of this paper. Section III introduces the basic terminology that will
be used in the paper. The details of the variable-latency scheme are
explained in Section IV, the technique is discussed in Section V and
the algorithm to optimize the cycle time is presented in Section VI.
Finally, Section VII explains how a variable-latency circuit is built
and Section VIII reports the experimental results.
II. OVERVIEW AND CONTRIBUTIONS
This section introduces the basics of the scheme for the design
of variable-latency units and the main differences with the existing
approaches for telescopic units.
Figure 1(a) shows an example of the computation of the error de-
tection function (F
err
) for a telescopic unit. The example implements
a 6-bit ripple carry adder in which each box represents a full adder.
Assuming that the delay of each full adder is 1 unit, the critical
path is 6 units. There are theoretical studies [11] that demonstrate
delays larger than 4 units are rarely activated. A possible error
detection function for this cycle time is F
err
=(A
4
⊕ B
4
)(A
3
⊕ B
3
),
which describes the condition when the carry c
2
is 1 and the 3
th
and 4
th
full adders propagate a carry. If the error detection function
is activated, the sum operation requires two cycles. Note that this
speculation function is not exact since it does not consider the carry
propagation at the least-significant bits of the addition. Assuming a
uniform distribution of the inputs, the probability of F
err
is 0.25,
while the probability of an exact error function would be 0.1875.
The scheme proposed in this paper is based on the speculation
of some values that approximate signals in the critical paths. Two
examples of the computation of the error detection function by
speculation are presented in Fig. 1(b)-(c). Assume that the carry c
2
is selected as a speculation point. The objective is to find a simple
function that approximates the behavior of the carry signal c
2
with
high probability. Figure 1(b) shows the selection of the constant zero.
In terms of area and delay, it is a good function because there is no
overhead, but the carry c
2
is 1 with probability 0.375. Figure 1(c)
shows another approximation. The selected function is A
2
· B
2
which
decreases the probability of error down to 0.125. The error detection
function will detect the errors by comparing the approximation with
the exact function.
978-3-9810801-5-5/DATE09 © 2009 EDAA