An Industrial Fault Injection Platform for Soft-Error Dependability Analysis and Hardening of Complex System-On-a-Chip Jean-Marc Daveau , Alexandre Blampey , Gilles Gasiot , Joseph Bulone , Philippe Roche STMicroelectronics, 850 Rue Jean Monnet, F-38926 Crolles, France. 12 Rue Jules Howrowitz, BP217, F-38019, Grenoble, France. Email: name.surname@st.com Abstract—This paper presents a fault injection platform that is currently developed and used to perform soft-error dependability analysis and hardening of complex SoCs 1 . Primarily, it is oriented toward safety analysis, safety requirement conformance testing and hardening of complex SoCs. This platform makes use of clusters of hardware emulation resources available for SoC verification to achieve massive faults injection capabilities. It is able to distribute fault injection campaigns across multiple heterogeneous emulation platforms to achieve high fault coverage. It is able to virtually handle almost any circuit size and is designed to support all kind of designs. We present the first results obtained on a small design, the Leon2 IP, on which exhaustive fault injection have been performed. I. I NTRODUCTION Soft-error dependability and failure mode analysis of com- plex systems can be addressed by different approaches, from high level simulation down to gate level fault injection. It is a mandatory design validation phase when compliance to safety requirement and certification is expected. Several analysis related to the system reaction to faults have to be performed and safety metrics measured in order to obtain a given certification. The platform and methodology presented in this paper address two of them: 1) soft-error-rate analysis, i.e. extraction of error-rate met- rics such as Safe Failure Fraction (SFF) or Hardware Fault Tolerance (HFT ) under several fault models. 2) failure mode analysis, i.e. the analysis and classification of the failures consequences caused by soft-errors. Different approaches are presented in the literature [1] to evaluate the reliability of digital systems to soft-errors using probability theory [2]–[7], fault injection by formal methods [8], simulation [9], [10] and prototyping [11]–[14] or at system level [15]. From an industrial point of view, simulation, formal or probabilistic methods are limited by the size of the system they can handle. These methods are also limited by the need of specific, possibly formal, models or knowledge on the target architecture, which may not be available for intellectual- property reason in an industrial environment. 1 System-On-a-Chip Fault injection can take several forms depending on the level of abstraction used to represent the system and the faults applied. In this paper we address fault injection at a very low level of abstraction: the gate level netlist. This close-to-silicon level of abstraction allows disposing of several information related to soft-errors sites such as actual flip-flops or memories of the design and apply realistic faults models. It can model accurately several soft-error types and address a wide range of system with only a systematic instrumentation approach. Still, this level of abstraction is high enough to dispose of fast simulation means and allow comprehensive study of the soft-error consequences. When specifying the SoC fault injection platform and methodology, several objectives where defined: high speed. Advanced fault injection methods may re- quire a large number of faults to be injected. Therefore the platform should accommodate millions of injections in a reduced time. high gate capacity. In order to handle large industrial design such as SoCs, multi-million gates and flip-flops capacity is required. intellectual property. In order to avoid any intellectual property issue, the methodology shouldn’t require high level description or model of the analyzed system neither any details on its architecture. early evaluation. Early evaluations may be requested for high reliability design. Therefore the methodology should also accommodate faults injection on high level or partial designs. systematic approach. The methodology should address a wide range of designs without the need to adapt the tools and methods. safety analysis. The ability to perform FMDEA 2 implies supporting advanced observability and controllability fea- tures and faults models, corresponding to different failure mode. To cope with the first two issues, our approach relies on in- dustrial hardware emulators which offer the best performance 2 Failure Mode Diagnostic and Effects Analysis