Hindawi Publishing Corporation VLSI Design Volume 2013, Article ID 785281, 12 pages http://dx.doi.org/10.1155/2013/785281 Research Article A General Design Methodology for Synchronous Early-Completion-Prediction Adders in Nano-CMOS DSP Architectures Mauro Olivieri and Antonio Mastrandrea Department of Information Engineering, Electronics and Telecommunications, Sapienza University of Rome, Via Eudossiana 18, 00184 Rome, Italy Correspondence should be addressed to Mauro Olivieri; olivieri@diet.uniroma1.it Received 12 September 2012; Revised 2 December 2012; Accepted 5 December 2012 Academic Editor: Meng-Hsueh Chiang Copyright © 2013 M. Olivieri and A. Mastrandrea. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Synchronous early-completion-prediction adders (ECPAs) are used for high clock rate and high-precision DSP datapaths, as they allow a dominant amount of single-cycle operations even if the worst-case carry propagation delay is longer than the clock period. Previous works have also demonstrated ECPA advantages for average leakage reduction and NBTI effects reduction in nanoscale CMOS technologies. is paper illustrates a general systematic methodology to design ECPA units, targeting nanoscale CMOS technologies, which is not available in the current literature yet. e method is fully compatible with standard VLSI macrocell design tools and standard adder structures and includes automatic de�nition of critical test patterns for postlayout veri�cation. A design example is included, reporting speed and power data superior to previous works. 1. Introduction Fast integer adders are an essential component of most DSP datapaths. Synchronous early-completion-prediction adders (ECPAs) [1], also known as variable-latency adders [2], have been introduced for high clock rate and high-precision datapaths, as they allow single-cycle operations even if the clock period is shorter than the worst-case carry propagation delay. anks to the data dependency of actual carry chain propagation, the occurrence of multicycle operations can be maintained statistically rare, thus allowing an overall speed improvement. e industrial effectiveness of the idea was �rst proven by the design of a full-custom ECPA unit for a DSP datapath at Toshiba Labs [1]. e logic foundation of that adder is shown in [3]. An extension to multiply unit design has been shown in [4]. e works in [2] and [5] have recently pointed out the potentials of variable-latency adder units in nano-CMOS addition units, for reducing average leakage power consumption and improving robustness to NTBI faults occurring in nano-scale technologies. An ECPA consists of a conventional adder plus a completion-prediction logic unit (Figure 1). e prediction unit estimates the actual critical path length in the adder depending on the operand values and hence the cycle count of the operation for the target cycle time. is approach differs from asynchronous completion detection units [6– 8], as it is based on a totally synchronous scheme. From the design point of view, the logic speci�cation of the prediction function depends on the target cycle time and on the estimation of the variable completion time of the adder, in order to de�ne the cycle count output. Moreover, the speed of the prediction unit is critical, since the prediction must always be completed in a single cycle in order to be effective. No general design methodology for ECPA VLSI cores has been proposed yet. In [3], Lee and Asada analyzed the design problem on the basis of 2-input-gate unit delay within a ripple carry adder structure. In [1], Kondo et al. address the full-custom design case of a fast carry-select structure. In [9], Nowick et al. deal with the design of speculative- completion adders, similar in principle to ECPA but again