An FPGA-Based Framework for Run-time Injection and Analysis of Soft Errors in Microprocessors M. Sauer 1 , V. Tomashevich 2 , J. M¨ uller 1 , M. Lewis 1 , A. Spilla 1 , I. Polian 2 , B. Becker 1 , and W. Burgard 1 1 Albert-Ludwigs-University 2 University of Passau Georges-K¨ ohler-Allee 51 Innstraße 43 79110 Freiburg i. Br., Germany 94032 Passau, Germany {sauerm,muellerj,spilla,lewis,becker,burgard}@informatik.uni-freiburg.de {victor.tomashevich,ilia.polian}@uni-passau.de Abstract—State-of-the-art cyber-physical systems are increas- ingly deployed in harsh environments with non-negligible soft error rates, such as aviation or search-and-rescue missions. State-of-the-art nanoscale manufacturing technologies are more vulnerable to soft errors. In this paper, we present an FPGA- based framework for injecting soft errors into user-speciﬁed memory elements of an entire microprocessor (MIPS32) running application software. While the framework is applicable to arbitrary software, we demonstrate its usage by characterizing soft errors effects on several software ﬁlters used in aviation for probabilistic sensor data fusion. I. I NTRODUCTION Soft errors cause nodes within a circuit to temporarily fail. They are typically generated by ionizing radiation from α- particles or cosmic rays [10]. As modern transistors shrink, the probability of a fault occurring increases. Soft errors have traditionally been a concern in safety-critical systems including medical devices [4] and aviation/space applications where chips operate under increased radiation [7], [23]. Today, cost pressure and energy constraints limit the applicability of massive redundancy, while increased complexity of cal- culations performed in novel applications such as robots on search-and-rescue missions necessitate the usage of powerful microprocessors. On the positive side, there is also emerging evidence that many applications, including image-processing [24] and artiﬁcial-intelligence algorithms [18], are resilient, i.e., pro- duce tolerable results even when they are affected by soft errors during operation. Before spending signiﬁcant hardware resources for radiation hardening or redundancy, it is necessary to understand which impact soft errors would have on the target application. In this paper, we introduce an FPGA-based fault-injection platform to test and simulate transient faults in micropro- cessors using FPGAs. The platform can provide insight into the design’s soft error characteristics, allowing a designer to logically harden a chip by adding error correction and test the improvements before the chip is actually produced. The platform is ﬂexible with respect to the target processor (which is synthesized on the FPGA and equipped with a scan- based fault injector), the application software, the input data, and the proﬁles of faults injected. The generic fault-injection manager, running on the PC side, encapsulates all the technical details and controls the FPGA over a communication protocol (transfers fault-injection information and the input data to the FPGA and receives and evaluates the obtained output data). By allowing the FPGA to communicate with further external devices, we can test the susceptibility of the processor when it is running its native applications in its usual environment. Although a signiﬁcant amount of research has resulted in software simulation methods [1], [5], [22], [11], these tools are not powerful enough to simulate entire SoCs running real applications such as the probabilistic ﬁlters reported here. Radiation testing [27], [29] can provide insight but requires access to a radiation testing facility and non-trivial conversion of the error rates observed [16]. Therefore, FPGA-based emulation has been used to replace [6], [19] or complement [17], [25] software simulation. A good overview of much of this research can be found in [12]. More recently, the partial- reconﬁguration features of FPGAs for fault injection [2], [21] and large parallel emulators al. [8] have been utilized. Ongil et al. [20] discuss various types of fault injection techniques for medium-size non-programmable circuits. In [14], these techniques are applied to the Leon 2 processor and compared with the software-based code-emulated upsets method [13]. Our objective is to use an entire System on Programmable Chip (SoPC) design that incorporates a MIPS32 based micro- processor with peripheral devices such as general-purpose I/Os and serial UARTs, along with a programmable fault injector. With respect to the software approaches, we are still able to simulate millions of clock cycles per second in real time for our SoPC, and our approach does not limit what areas of the processor faults can be injected into. In our case, all storage elements can be affected by the transient faults that we inject. While we use ideas from some of these publications (most notably the general design of the fault injector from [6]), our framework focuses on evaluating complex software applications. The FPGA emulation architecture is described in Section II. Background on Bayesian ﬁlters used as the software appli- cation on the test is given in Section III. Section IV presents experimental results. Section V concludes the paper. II. RUN- TIME FAULT I NJECTION Our FPGA-based fault-injection framework, shown in Fig- ure 1, consists of an FPGA part on which the actual fault- injection experiment is run and the PC part. The latter is used to (1) communicate inputs of the experiment to the FPGA part; (2) control the experiment execution, and (3) receive and analyze its outcomes. The FPGA part implements the target processor on which the application under test is run; the fault injector; the memory; and controllers to handle input/output