Process Variation Aware Timing Optimization through Transistor Sizing in Dynamic CMOS Logic Kumar Yelamarthi and Chien-In Henry Chen Department of Electrical Engineering Wright State University Dayton, OH 45435, USA E-mail: {yelamarthi.2, henry.chen}@wright.edu Abstract A major challenge in the design of microprocessor circuits is transistor sizing in dynamic CMOS logic due to increased number of channel-connected transistors on various paths of the design, and increased magnitude of process variations in the nanometer process. This paper proposes a process variation aware transistor sizing algorithm for dynamic CMOS logic. The efficiency of this algorithm is illustrated first by a 2-b weighted binary-to- thermometric converter, of which the critical path delay was optimized from 355 to 157 ps which accounts for a 55.77% delay improvement, and the delay uncertainty due to process variation was optimized by 60.75%. A 4-b unity weight binary-to-thermometric converter was also optimized, of which the critical path delay was reduced from 152 to 103 ps which accounts for a 32.23% delay improvement, and delay uncertainty was optimized by 63.6%. Applying the proposed timing optimization algorithm to a mixed-dynamic-static CMOS 64-bit adder, the critical path delay and the power-delay-product were optimized to 632 ps and 84.17 pJ, respectively. 1. Introduction The performance of microprocessors has been driven traditionally by CMOS technology and micro architectural improvements [1]. Using custom dynamic CMOS circuits in microprocessors has increased timing performance significantly over static CMOS circuits [1-2]. A major challenge in timing optimization of dynamic CMOS logic is transistor sizing. This elevated complexity of timing optimization is mainly caused by many effects such as charge sharing, noise-immunity, environmental and semiconductor process variations, and leakage current, etc. For example, research has demonstrated that process variations may cause up to 30% variation in chip frequency, along with 20X variation in chip leakage [12]. The continued scale-down CMOS process has significantly increased the number and magnitude of process variations. The magnitude of intra-die channel length variations was estimated to increase from 35% of total variation in 130 nm to 60% in 70 nm; and the variation in wire width, height, and thickness was also expected to increase from 25% to 35% [9]. This further highlights the importance of accounting for process variations in delay estimates of high-performance circuits. The literatures of transistor sizing were presented in [3- 6], but most algorithms focus towards static CMOS circuits and technologies using dual threshold voltages. TILOS [4] presented an iteratively transistor sizing algorithm but does not guarantee a convergence of timing optimization. MINFLOTRANSIT [5] proposed an iterative relaxation method but requires generation of directed acyclic graphs iteratively for timing optimization. Many researches have been presented to reduce the effect of process variations on chip performance [8-12]. Most deal with statistical variations and are not optimal for designs with large number of parameter variations. The Adaptive Body Biasing (ABB) technique presented in [10, 12] is implemented post-silicon where each die receives a unique bias voltage thus reducing variance of the frequency variation. But, this method is not feasible for addressing intra-die variations as each block in the design requires a unique bias voltage. Another limitation of this method is the leakage power is highly increased because of the reduced threshold voltage. In [11] keepers were introduced to compensate process variation effect but the method required additional hardware to program the keeper transistor. As variations in D L and D W are random and are predicted to be the major contributors towards total variations in nanometer CMOS process [9], design tools need to account for random variations as well. Monte Carlo method accounts for both systematic and random variations. Although it is slow, it is accurate when the number of sources of variations is significantly high [13]. It can be extended to incorporate crosstalk and IR drop into simulation. In addition to timing optimization by reducing delay, chip performance has to be improved by reducing the delay uncertainty due to process variations as depicted in (1), where T Max is the worst-case delay and T Min is the best-case delay from Monte Carlo simulations. Min Max y uncertaint T T T - = (1)