System-on-Chip Oriented
Fault-Tolerant Sequential Systems Implementation Methodology
S. Pontarelli, G.C. Cardarilli, A. Malvoni, M. Ottavi, M. Re, A. Salsano
{ottavi,pontarelli,salsano}@ing.uniroma2.it
{marco.re, g.cardarilli}@ieee.org
malvoni@wappi.com
Department of Electronic Engineering University of Rome “Tor Vergata”, Italy
Via di Tor Vergata 110
00133-Rome-ITALY
ABSTRACT
This paper presents a design methodology for fault tolerant sequential systems implemented on System on Chip
(SoC). In the paper, as an example, a complex fault tolerant finite state machine has been mapped on the FPGA
contained in the SoC. The fault identification has been obtained by using a checker permitting the identification of
class of faults.
When a fault is detected, an interrupt for the microcontroller is generated and the interrupt handling routine
partially reprograms the FPGA to override the part of memory configuring the faulty block.
The architectures of the SoCs recently appeared on the market are characterized by a very efficient interaction
between the microcontroller and the FPGA allowing a very efficient implementation of the fault detection and
fault recovery strategy. A test bed of the proposed methodology has been implemented on the recently presented
Atmel AT94K FPSLIC (Field Programmable System Level Integrated Circuits).
1. INTRODUCTION
Different approaches to the design of high reliability digital electronic systems for hostile environments have been
proposed in the past. The kind of used technique depends on the level of reliability and security required for the
specific application.
The main approaches to the design of high reliability electronic systems are mainly two: Fault avoidance and
Fault Tolerance. The former can be accomplished by using a technology based approach e.g. Radiation-Hard
components (like S.O.I. or similar) [1] while the latter can be accomplished by using suitable system level based
techniques [2].
Radiation-Hard (RH) components are built by using expensive technological processes and guarantee the required
protection against high-energy radiation. The main drawbacks of using radiation hard components is the
component cost and the unavailability of the state-of- the-art parts in the RH version.
As stated before, fault tolerance is obtained by using system level strategies like
• Triple Modular Redundancy (TMR) or in general N Modular Redundancy (NMR) originally suggested by
Von Neumann [1]
• Fault detection, fault masking and recovery [2]
Nowadays, a big research effort is spent on the fault detection approach. This method present the benefit of a
lower hardware redundancy with respect to the TMR approach but introduces the necessity of reconfiguration
algorithms (software redundancy) and consequently downtime caused by the MTTR (Mean Time to Repair).
Moreover, in space applications, where the number of manufactured parts is small, the use of standard
components, which can be reconfigured by a simple and inexpensive procedure, is becoming increasingly
important. Currently, some companies like Atmel [4] and Triscend [5] are proposing solutions (called System on
Chip or SoC) that integrate on the same chip a considerable amount of memory, a field programmable logic array
Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’01)
1063-6722/01 $17.00 © 2001 IEEE