1 Modeling the Effect of Process Variations on the Delay and Power of the Digital Circuit Using Fast Simulators Amirali Amirsoleimani 1 , H. Soleimani 1 , A. Ahmadi 1 , M. Bavandpour 2 , M. Zwolinski 3 1 Electrical Engineering Department, Razi University, Kermanshah, Iran, 2 Faculty of Electrical Engineering, Sharif University of Technology, Tehran, Iran, 3 School of Electronic and Computer Science, University of Southampton, Southampton, UK amirali.amirsoleimani@ieee.org,{hsoleimani,aahmadi}@razi.ac.ir, bavandpour@ee.sharif.ir, {km3, mz}@ecs.soton.ac.uk Abstract: Process variation has an increasingly dramatic effect on delay and power as process geometries shrink. Even if the amount of variation remains the same as in previous generations, it accounts for a greater percentage of process geometries as they get smaller. So an accurate prediction of path delay and power variability for real digital circuits in the current technologies is very important; however, its main drawback is the high runtime cost. In this paper, we present a new fast EDA tool which accelerates Monte Carlo based statistical static timing analysis (SSTA) for complex digital circuit. Parallel platforms like Message Passing Interface and POSIX® Threads and also the GPU-based CUDA platform suggests a natural fit for this analysis. So using these platforms, Monte Carlo based SSTA for complex digital circuits at 32, 45 and 65 nm has been performed. and of the pin-to-output delay and power distributions for all basic gates are extracted using a memory lookup from Hspice and then the results are extended to the complex digital circuit in a hierarchal manner on the parallel platforms. Results show that the GPU-based platform has the highest performance (speedup of 19x). The correctness of the Monte Carlo based SSTA implemented on a GPU has been verified by comparing its results with a CPU based implementation. Keywords: Monte Carlo, Statistical timing, Graphic Processing Unit (GPU), Message Passing Interface (MPI), POSIX® Thread (Pthread). I. INTRODUCTION Nowadays process variation is becoming increasingly significant with the rapidly decreasing minimum feature sizes of VLSI fabrication processes. In particular, the resulting increase of delay and power variations has strongly affected timing yields and maximum operating frequencies of designs. Variations are categorized into random and systematic. Random variations are independent of the locations of transistors within a chip. Systematic variations are dependent on location. Static timing analysis (STA) is used in a conventional VLSI design flow to estimate circuit delay and the maximum operating frequency of the design. In order to deal with variations and move beyond the deterministic nature of traditional STA techniques, statistical STA (SSTA) was developed. The main idea of SSTA is to include the effect of variations in order to analyze circuit delay more accurately. Monte Carlo based SSTA is an accurate method for performing SSTA. SSTA via Monte Carlo simulation generates N samples of the gate delay random variables and executes static timing analysis runs for each sample. Finally, the results are aggregated to produce the full circuit delay distribution. SSTA algorithms can be broadly categorized into block-based and path-based. The level of accuracy needed should determine which is used. Block-based reporting is much faster since it merely propagates one distribution from each stage to the next and so on until it reaches the endpoint using a statistical MAX operation for setup or a statistical MIN operation for hold. The resulting arrival time PDF is an accurate statistical approximation of the actual arrival time PDF for all paths to a given endpoint. For timing analysis of a large design or block, a block-based approach should be used in order to minimize runtime [1-2]. In such cases, only an approximation is computed, using the upper-bound or lower- bound of the PDF calculation. The more accurate path-based analysis can then be used selectively to gain better accuracy. The path-based approach actually propagates the PDF for all possible paths for each endpoint and combines them to create a joint PDF (JPDF) of the arrival time. For both path and block-based analysis techniques, numerous works have been tried to present an accurate but fast variability simulator [3-9]. However, the main drawback in their simulation is the high runtime cost. More performance can be achieved by using parallel platforms which accomplish portions of the process on more than one processor core simultaneously. Several programming language libraries like Pthread and MPI have been developed to simplify a CPU’s core management and access. Also recently, Graphic Processing Units (GPUs) have appeared as a powerful and affordable computational platform with a diverse range of applications [10]. Structurally, they are organized as arrays of highly threaded streaming multiprocessors [11]. Since Monte Carlo based SSTA fits the requirement of parallel processing well, the generation of samples and the corresponding static timing analysis for real digital circuit computation can be executed in parallel, with no data-dependency. Using this independency we transfer this idea to the high level simulation of a real digital circuit, so for functional unit at the same logic level, we can execute Monte Carlo based SSTA in parallel. As mentioned before, the greatest drawback in simulation of process variations for complex digital circuits is the high runtime cost. To overcome this problem, in this paper a path- based SSTA paradigm, which exploits the parallelism in the Monte Carlo approach for SSTA is presented. Delay and power variations for complex digital circuits have been simulated on single and multi-core CPU and GPU platforms in a hierarchal manner and the performances compared. For this purpose the delay and power distributions for all basic gates are extracted using a memory lookup from Hspice software and then the results are extended to real complex digital circuits. The remainder of this paper is organized as follows. Section II details our approach for implementing Monte Carlo based SSTA. Section III details different parallel platforms and their simulation methodology. Section VI presents results and the paper concludes in V.