Timing Error Tolerance in Nanometer ICs S. Valadimas, Y. Tsiatouhas * and A. Arapoyanni University of Athens, Dept. of Informatics and Telecommunications, 15784 Athens, Greece *University of Ioannina, Dept. of Computer Science, 45110 Ioannina, Greece Abstract—Timing error tolerance turns to be an important design parameter in nanometer technology, high speed and high complexity integrated circuits. In this work, we present a low cost, multiple timing error detection and correction technique, which is based on a new Flip-Flop design. The proposed design approach provides timing error tolerance at the small penalty of one clock cycle delay in the circuit operation for each error correction. In addition, it is characterized by very low silicon area requirements compared to previous design schemes in the open literature. The proposed technique has been applied in a 90nm pipeline design of a digital FIR filter and the simulation results validated its efficiency. Keywords: Timing failures, Timing errors, Error detection and correction, Timing error tolerance. I. INTRODUCTION CMOS technology evolution and integrated circuits and systems complexity explosion, in the nanometer era, result in an ongoing difficulty to achieve adequate reliability levels and keep the cost of testing within acceptable bounds [1-2]. The device size scaling, the power supply reduction and the operating frequency increase affect circuits’ noise margins and reliability. In this context, the probability of transient faults generation increases making hard to limit error rate levels within specifications. Various mechanisms like crosstalk, power supply disturbance or ground bounce have been accused for timing error generation. The increased path delay deviations, due to process variations, and the manufacturing defects that affect circuit speed may also result in timing errors that are not easily detectable (in terms of test cost) in high frequency and/or high device count integrated circuits (ICs). Although very complex testing procedures are followed, these are not sufficient to exercise the huge number of paths in modern circuit designs, and consequently they are not capable to effectively screen out all timing related defective ICs. Thus, a considerable part of defective ICs may escape the fabrication tests. Additionally, and for the same reasons, timing verification turns to be a hard task escalating the probability of timing failures in a design. Furthermore, modern systems running at multiple frequency and voltage levels may suffer from an increased timing error rate due to numerous environmental and process related as well as data dependent variabilities that may affect circuit performance. In addition, transistor aging problems significantly impact the performance of nanometer circuits resulting in the appearance of timing errors early in the circuit lifetime [3-4]. Such examples are the Negative Bias Temperature Instability (NBTI) induced aging of PMOS transistors and the hot-carrier injection (HCI) induced aging of NMOS transistors, which increase the threshold voltage over time increasing so the path delays [5]. From the above, it is evident that concurrent on- line testing techniques for timing error detection and correction are becoming mandatory in order to achieve acceptable levels of error robustness and meet reliability requirements. Besides, dynamic voltage scaling (DVS) techniques, for low power operation, may perform more efficiently tolerating timing errors by exploiting error detection and correction mechanisms to overcome increased error rates [6], [7]. Timing failures in a combinational logic block result in delayed responses at its outputs. Such a delayed response arrival, after the triggering edge of the clock signal that drive the memory elements at the outputs of the combinational block, will produce an erroneous value and the generation of a timing error on the data stored in the pertinent memory element. A number of error detection techniques have been proposed in the open literature [8]–[14]. These sense the delayed circuit response and provide error tolerance using time redundancy approaches. A well known error detection scheme is based on the use of a comparator that is realized by a simple XOR gate [10]–[12]. The monitoring circuitry consists of an additional memory element plus a XOR gate for every memory element (main latch or Flip-Flop) in the design. The secondary memory element is clocked by a delayed version of the system clock that feeds the main memory element. This delay is equal to the maximum signal delay (d max ) that must be tolerated in order to achieve an acceptable level of timing error rate, plus the setup time of the used memory elements (t su ). Thus, the secondary memory element captures a delayed version of the data stored in the main memory element. In the presence of a timing error the stored data in the two memory elements differ, while the secondary memory element holds the correct delayed response of the combinational logic. The XOR gate “compares” the contents of the two memory elements and in case of discrepancy it raises its output to high indicating the error detection. The local error indication signals are collected by an OR gate (realized as an OR tree) to generate a global error indication signal. This signal is exploited to achieve error tolerance by performing a retry procedure after error detection. During the retry operation the period of the system clock must be increased to provide the necessary time for correct response evaluation. In this work we present a low cost timing error detection and correction scheme, that is based on a new Flip-Flop topology. Moreover, we introduce a pipeline architecture to exploit this Flip-Flop and provide timing error tolerance in a design. The paper is organized as follows. In Section II, relevant timing error detection and correction techniques, presented in the open literature, are discussed. Next, in Section III, the new design technique is introduced and its operation is analyzed. Section IV provides simulation results 283 978-1-4244-7723-4/$26.00 c 2010 IEEE