Investigation of transient fault effects in synchronous and asynchronous Network on Chip router Pooria M. Yaghini, Ashkan Eghbal ⇑ , Hossein Pedram, Hamid Reza Zarandi Department of Computer Engineering and Information Technology, Amirkabir University of Technology, Tehran, Iran article info Article history: Received 27 March 2010 Received in revised form 26 July 2010 Accepted 4 October 2010 Available online 13 October 2010 Keywords: Network on chip router Asynchronous and synchronous design Fault tolerance evaluation Fault injection abstract This paper presents comparison of transient fault effects in an asynchronous NoC router and a synchro- nous one. The experiment is based on simulation-based fault injection method to assess the fault-tolerant behavior of both architectures. The effort has been accomplished by employing fault injector signal (FIS) in asynchronous design and synchronous one. Different fault models such as Crosstalk, SEU, and SET have been applied in both architectures to evaluate their robustness. Glitch fault model has also been injected through the asynchronous scheme. The experimental results have been considered in different aspects to estimate the NoC router’s robustness. Although asynchronous designs seems inherently fault-tolerant due to applying handshaking signals, up to 55% of the injected faults result in failure, and about 44% of injected faults are replaced by new values before turning into errors. Less than 1% of injected faults trea- ted as latent error. Moreover, the failure rate of token generation is higher than token consumption effects. Furthermore, experiments show that asynchronous NoC router is more robust than the synchro- nous one by preventing the fault propagation. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction Utilizing more gates on a same silicon die has been pragmatic with the help of developed technology. Data transmission through a chip is becoming more difficult, as the complexity of a system has increased. Employing a new improved communication infrastruc- ture is inevitable to provide a well-developed connectivity among the processing elements (such as CPU, DSP, or memory cell) on a single chip. Network on chip architecture is a practical alternative for traditional System on Chip (SoC) approach, supporting better modularity, scalability, and higher bandwidth [1]. Faulty operation of such interconnections might affect the func- tionality of connected processor elements. Controlling the physical parameters in the fabrication process is unexpected due to shrink- age of dimension [2,3]. Fault arising such as Crosstalk, Electro Migration (EM), Electromagnetic Interferences (EMI), Alpha parti- cles hits, and also cosmic radiation can affect the functionality of a NoC router or, eventually lead it to failure [4]. The fault tolerance characteristic of a digital system has gotten a matter of consider- able concern among designers because of these phenomena. Based on definitions, a fault-tolerant system should operate cor- rectly in presence of any hardware or software faults. In other words, a fault tolerance design identifies the potential causes of failures and recovers them. This identification is performed by means of hardware, time, or information redundancy. With this end in view a comprehensive study is necessary to find the most sensitive component of a digital system. This analysis can be accomplished in two different ways which are formal and experimental analysis. In the first method a system is modeled by arithmetical formulas. Applying these sorts of rules, the reliabil- ity of a system can be estimated in a shorter time in contrast to second method. The experimental analysis is done by means of fault injection, providing a faulty environment to evaluate robust- ness of the system. Fault injection is a popular technique in evalu- ating the dependability attributes of a system [5]. This technique can be implemented in three main strategies consisting of physical, simulation, and software based fault injec- tions. In physical method the original device is applied and infected by faults. Software strategy tries to produce the errors that might happen during its operation. The last method that is considered in this article is simulation-based technique. It is implemented by simulating a system on another computer. It is more popular due to its highest observability and controllability. It can also be used during the primitive steps of a system design, reducing the testing cost. In this method the sensitive parts are observed in a faulty environment [6]. To keep the cost of imposing redundancy low, an exhaustive research seems necessary to find the most tenuous components in a NoC router. A few researches have been performed to estimate the fault-tolerant property of synchronous NoC routers with 1383-7621/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2010.10.003 ⇑ Corresponding author. Tel.: +98 9121976404. E-mail addresses: pooria.yaghini@aut.ac.ir (P.M. Yaghini), ashkanxy@aut.ac.ir (A. Eghbal), pedram@aut.ac.ir (H. Pedram), h_zarandi@aut.ac.ir (H.R. Zarandi). Journal of Systems Architecture 57 (2011) 61–68 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc