Hardware Fault Injection Using Dynamic Binary Instrumentation: FITgrind Ute Wappler Christof Fetzer Technische Universt¨ at Dresden Department of Computer Science Dresden, Germany {ute.wappler,christof.fetzer}@inf.tu-dresden.de Abstract To test software implemented hardware fault tolerance (SIHFT) mechanisms, injection of hardware faults is the appropriate instrument. Existing tools are mostly proces- sor dependent, difficult to use, and do not allow for a fine grained fault propagation analysis. FITgrind, which uses an artificial hardware architecture provided by Valgrind [6], alleviates these issues. 1 Motivation For testing fault tolerance mechanisms fault injection is a useful and accepted tool. Injection of transient, intermit- tent, and permanent hardware faults is required to analyze SIHFT whose objective is to detect such hardware faults. The most realistic but also most expensive hardware fault injection approaches are hardware-based and use, e.g., radi- ation or pin-level fault injection. Since these are very costly and might even destroy the hardware, many software-based solutions have been developed over time, which either use simulation of a hardware architecture or inject faults via changing the execution of a program either by directly mod- ifying hardware or software state. We have been developing a new fault injection tool FIT- grind which uses dynamic binary instrumentation provided by Valgrind [6]. The tool abstracts from the underlying hardware architecture and faults are injected into the arti- ficial architecture provided by Valgrind. The tool alleviates problems of existing tools, such as Xception [1] or FER- RARI [5]: Processor independence which is neither provided by FERRARI nor Xception. Ferrari needs specification of the processor’s instruction set and Xception uses the pro- cessor’s debugging features. FITgrind can be used on all platforms supported by Valgrind which are currently x86, amd64, ppc32 and ppc64 running under Linux. Monitoring & taint analysis: similar to [2] or [7] is real- izable using Valgrind’s API. This will enable fine granu- lar analysis of the propagation of faults and thus, a pre- cise evaluation of SIHFT mechanisms. To the best of our knowledge none of the existing injection tools supports fine-grained fault propagation analysis. At best, they log the context when a fault is injected, where which fault is injected and the outcome of the fault injection run. Ease of use: no specification of a hardware model (FER- RARI), and no difficult testbed requiring more than one machine (Xception) are necessary. Injection into binaries: faults can be injected into bina- ries without recompiling them to assess their fault toler- ance. Additionally, recompilation and a light-weight in- strumentation will give the user even more control over the fault injection process and the ability to do a fine- grained taint analysis. Ease of implementation and extensibility: the first ver- sion of FITgrind which is able to inject transient bit flips on operands and results and transient or permanent re- placement of instructions, requires only about 1000 lines of C code and is easy to extend. For each new fault type injection one has to provide a method to select the in- jection points and a method to add the instrumentation code. The following fault types can in principle be injected by binary instrumentation: Modification of operands and results to simulate bit flips or stuck-at faults in memory, registers or on buses. Exchange of operands with other operands to simulate address line faults. Replacement of instructions with other valid instructions or groups of instructions to simulate address line faults. Faulty instructions execution to simulate bugs such as the famous Pentium FDiv bug [4]. Note that such a bug has to be mapped onto the hardware architecture simulated by Valgrind. Modification of jump conditions and destinations to sim- ulate control flow errors. It would be also possible to simulate bit flips in instruc-