System-on-Chip Oriented Fault-Tolerant Sequential Systems Implementation Methodology S. Pontarelli, G.C. Cardarilli, A. Malvoni, M. Ottavi, M. Re, A. Salsano {ottavi,pontarelli,salsano}@ing.uniroma2.it {marco.re, g.cardarilli}@ieee.org malvoni@wappi.com Department of Electronic Engineering University of Rome “Tor Vergata”, Italy Via di Tor Vergata 110 00133-Rome-ITALY ABSTRACT This paper presents a design methodology for fault tolerant sequential systems implemented on System on Chip (SoC). In the paper, as an example, a complex fault tolerant finite state machine has been mapped on the FPGA contained in the SoC. The fault identification has been obtained by using a checker permitting the identification of class of faults. When a fault is detected, an interrupt for the microcontroller is generated and the interrupt handling routine partially reprograms the FPGA to override the part of memory configuring the faulty block. The architectures of the SoCs recently appeared on the market are characterized by a very efficient interaction between the microcontroller and the FPGA allowing a very efficient implementation of the fault detection and fault recovery strategy. A test bed of the proposed methodology has been implemented on the recently presented Atmel AT94K FPSLIC (Field Programmable System Level Integrated Circuits). 1. INTRODUCTION Different approaches to the design of high reliability digital electronic systems for hostile environments have been proposed in the past. The kind of used technique depends on the level of reliability and security required for the specific application. The main approaches to the design of high reliability electronic systems are mainly two: Fault avoidance and Fault Tolerance. The former can be accomplished by using a technology based approach e.g. Radiation-Hard components (like S.O.I. or similar) [1] while the latter can be accomplished by using suitable system level based techniques [2]. Radiation-Hard (RH) components are built by using expensive technological processes and guarantee the required protection against high-energy radiation. The main drawbacks of using radiation hard components is the component cost and the unavailability of the state-of- the-art parts in the RH version. As stated before, fault tolerance is obtained by using system level strategies like Triple Modular Redundancy (TMR) or in general N Modular Redundancy (NMR) originally suggested by Von Neumann [1] Fault detection, fault masking and recovery [2] Nowadays, a big research effort is spent on the fault detection approach. This method present the benefit of a lower hardware redundancy with respect to the TMR approach but introduces the necessity of reconfiguration algorithms (software redundancy) and consequently downtime caused by the MTTR (Mean Time to Repair). Moreover, in space applications, where the number of manufactured parts is small, the use of standard components, which can be reconfigured by a simple and inexpensive procedure, is becoming increasingly important. Currently, some companies like Atmel [4] and Triscend [5] are proposing solutions (called System on Chip or SoC) that integrate on the same chip a considerable amount of memory, a field programmable logic array Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’01) 1063-6722/01 $17.00 © 2001 IEEE