Variable-Latency Design by Function Speculation D. Ba˜ neres Universitat Oberta de Catalunya Barcelona, Spain J. Cortadella Universitat Polit` ecnica de Catalunya Barcelona, Spain M. Kishinevsky Strategic CAD Lab, Intel Corp. Hillsboro, OR USA Abstract—Variable-latency designs may improve the performance of those circuits in which the worst-case delay paths are infrequently acti- vated. Telescopic units emerged as a scheme to automatically synthesize variable-latency circuits. In this paper, a novel approach is proposed that brings three main contributions with regard to the methods used for telescopic units: first, no multi-cycle timing analysis is required to ensure the correctness of the circuit; second, the method can be applied to large circuits; third, the circuit can be optimized for the most frequent input patterns. The approach is based on finding approximations of critical nodes in the netlist that substitute the exact behavior. Two cycles are required when the approximations are not correct. These approximations can be obtained by the simulation of traces applied to the circuit. Experimental results on selected examples show a tangible speed-up (15%) with a small area overhead (3%). I. I NTRODUCTION The performance optimization of combinational circuits is usually accomplished by reducing the delay of the critical paths after an accurate timing analysis. This reduction is achieved by applying different transformations on the circuit, such as logic restructuring or gate sizing, that usually result in area and power overheads. There is an interesting property that can be exploited in many designs: the critical paths are infrequently activated. Instead of defining the cycle time by the worst-case delay, a shorter cycle time that covers a significant amount of input stimuli can be chosen. The worst-case operations may not be accommodated in this clock period and, therefore, more cycles may be required to complete them. These variable-latency circuits are commonly used in long data-paths, such as arithmetic circuits [1], to improve the performance. Telescopic units [2]–[4] is a paradigm to automatically build variable-latency circuits. An error detection function, referred as hold function in [3], is computed to inform the environment at which cycle the correct result is available at the outputs. This function externally controls the clock period by holding the values on the registers [5] or by adapting the clock frequency [6]. The variable-latency circuit can be also used in an elastic design [7]. A combinational block is constructed to compute the error detec- tion function that identifies those input patterns that require more than one cycle to complete the execution. The function is not always exact. A high computational cost is usually required to synthesize the exact function that covers all these input patterns. The complexity is equivalent to solving the false path problem, which is NP- complete [8]. The calculation of the error detection function requires an individual analysis of each input pattern. The proposed methods usually resort to symbolic methods (e.g. BDDs) that simultaneously analyze all input patterns. However, these strategies are often limited to small or medium-size circuits. Finally, the synthesis of telescopic units in conventional design flows requires the definition of multi-cycle constraints that compli- cates the design and validation flows. Moreover, they are not easy to represent, since they are exercised by a complex set of input patterns. This paper proposes an alternative and practical method to syn- thesize variable-latency units. The critical paths are substituted by non-critical signals that approximate their functionality. The error detection function checks the correctness of each approximation with regard the substituted signal. If the error detection function is not activated, the cycle time is reduced due to the utilization of faster signals in the original critical paths. If the error detection function is activated, the exact value of the substituted signal is supplied in the next cycle to amend the error. A similar technique has been previously applied to some arithmetic circuits [9]–[11]. An adder is a circuit with a long critical path (the carry signal) that can be easily approximated with near-zero effect on the correctness of the result. In this paper, we generalize the technique for the automatic synthesis of any circuit. The rest of the paper is organized as follows. Section II gives an overview on telescopic units and presents the main contributions of this paper. Section III introduces the basic terminology that will be used in the paper. The details of the variable-latency scheme are explained in Section IV, the technique is discussed in Section V and the algorithm to optimize the cycle time is presented in Section VI. Finally, Section VII explains how a variable-latency circuit is built and Section VIII reports the experimental results. II. OVERVIEW AND CONTRIBUTIONS This section introduces the basics of the scheme for the design of variable-latency units and the main differences with the existing approaches for telescopic units. Figure 1(a) shows an example of the computation of the error de- tection function (F err ) for a telescopic unit. The example implements a 6-bit ripple carry adder in which each box represents a full adder. Assuming that the delay of each full adder is 1 unit, the critical path is 6 units. There are theoretical studies [11] that demonstrate delays larger than 4 units are rarely activated. A possible error detection function for this cycle time is F err =(A 4 B 4 )(A 3 B 3 ), which describes the condition when the carry c 2 is 1 and the 3 th and 4 th full adders propagate a carry. If the error detection function is activated, the sum operation requires two cycles. Note that this speculation function is not exact since it does not consider the carry propagation at the least-significant bits of the addition. Assuming a uniform distribution of the inputs, the probability of F err is 0.25, while the probability of an exact error function would be 0.1875. The scheme proposed in this paper is based on the speculation of some values that approximate signals in the critical paths. Two examples of the computation of the error detection function by speculation are presented in Fig. 1(b)-(c). Assume that the carry c 2 is selected as a speculation point. The objective is to find a simple function that approximates the behavior of the carry signal c 2 with high probability. Figure 1(b) shows the selection of the constant zero. In terms of area and delay, it is a good function because there is no overhead, but the carry c 2 is 1 with probability 0.375. Figure 1(c) shows another approximation. The selected function is A 2 · B 2 which decreases the probability of error down to 0.125. The error detection function will detect the errors by comparing the approximation with the exact function. 978-3-9810801-5-5/DATE09 © 2009 EDAA