IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005 1859 Modeling of Failure Probability and Statistical Design of SRAM Array for Yield Enhancement in Nanoscaled CMOS Saibal Mukhopadhyay, Student Member, IEEE, Hamid Mahmoodi, Student Member, IEEE, and Kaushik Roy, Fellow, IEEE Abstract—In this paper, we have analyzed and modeled failure probabilities (access-time failure, read/write failure, and hold fail- ure) of synchronous random-access memory (SRAM) cells due to process-parameter variations. A method to predict the yield of a memory chip based on the cell-failure probability is proposed. A methodology to statistically design the SRAM cell and the memory organization is proposed using the failure-probability and the yield-prediction models. The developed design strategy statisti- cally sizes different transistors of the SRAM cell and optimizes the number of redundant columns to be used in the SRAM array, to minimize the failure probability of a memory chip under area and leakage constraints. The developed method can be used in an early stage of a design cycle to enhance memory yield in nanometer regime. Index Terms—Leakage, performance, random dopant ﬂuctu- ation (RDF), robustness, synchronous random-access memory (SRAM), yield. I. I NTRODUCTION T HE random variations in process parameters have emerg- ed as a major design challenge in circuit design in the nanometer regime [1]–[3]. The sources of the inter-die and the intra-die variations in process parameters includes variations in channel length, channel width, oxide thickness, threshold volt- age, line-edge roughness, and random dopant ﬂuctuations [the random variations in the number and location of dopant atoms in the channel region of the device resulting in the random variations in transistor threshold voltage (RDF)] [1]–[5]. These different sources of variations result in signiﬁcant variation in the delay and the leakage of digital circuits [1]–[5]. The inter- die variation in a parameter [say threshold voltage (V t )] mod- iﬁes the value of that parameter of all transistors in a die in the same direction (i.e., threshold voltage of all the transistors either increase or reduce). This principally results in a spread in the delay and the leakage, but does not cause a mismatch between different transistors in a die. On the other hand, the intra-die variations shift the process parameters of different Manuscript received September 14, 2003; revised December 2, 2004. This work was supported in part by the Semiconductor Research Corporation, the Defence Advance Research Project Agency Power Aware Computing and Communication (DARPA PACC) Program, Intel, and IBM Corporation. This paper was recommended by Associate Editor S. Sapatnekar. The authors are with the Department of Electrical and Computer Engi- neering, Purdue University, West Lafayette, IN 47907 USA (e-mail: sm@ecn. purdue.edu; mahmoodi@ecn.purdue.edu; kaushik@ecn.purdue.edu). Digital Object Identiﬁer 10.1109/TCAD.2005.852295 transistors in a die in different directions (e.g., V t of some transistors increase whereas that of some others reduce). The intra-die (or on-die) variations can be systematic (i.e., shift in a parameter of one transistor depends on the shift of that parameter of a neighboring transistor) or random (i.e., shifts in a parameter of two neighboring transistors are completely independent). An example of the systematic intra-die variation can be the change in the channel length of different transistors of a die that are spatially correlated. The RDF induced V t variation is a classic example of the random intra-die variation. The systematic variation does not result in large differences between the two transistors that are in close spatial proximity. The random component of the intra-die variation can result in a signiﬁcant mismatch between the neighboring transistors in a die [1]–[5]. In a static random-access memory (SRAM) cell, a mis- match in the strength between the neighboring transistors, caused by intra-die variations, can result in the failure of the cell [7]–[9]. For example, a cell failure can occur due to: 1) an increase in the cell access time (access time failure); 2) unstable read (ﬂipping of the cell data while reading) and/or write (inability to successfully write to a cell) operations (read/write failure); or 3) failure in the data holding capability of the cell (ﬂipping of the cell data with the application of a supply voltage lower than the nominal one) at the standby mode (hold failure in the standby mode). Since these failures are caused by the variations in the device parameters, these are known as the parametric failures [8], [9]. There can also be hard failures (caused by open or short) or soft failures due to soft error. In this paper, we will concentrate only on the parametric failures, and hereafter, by the word “failure,” we will refer to the parametric failures. A failure in any of the cells in a column of the memory will make that column faulty. In a memory, the redundant columns are used to improve the fault tolerance of the memory and when a column is detected as a faulty one, it gets replaced by an available redundant column. Thus, if the number of faulty columns in a memory chip is larger than the number of available redundant columns, then the chip is considered to be faulty (a similar argument holds for the memory designed with the row redundancy). Hence, the probability of failure of a cell is directly related to the yield of a memory chip. Thus, the intra-die-variation-induced device mismatch can signiﬁcantly reduce the yield of a memory. As the effect of the intra-die variations increases with the technology 0278-0070/$20.00 © 2005 IEEE Authorized licensed use limited to: San Francisco State Univ. Downloaded on December 10, 2008 at 17:52 from IEEE Xplore. Restrictions apply.