Hardware Fault Injection Using Dynamic Binary Instrumentation: FITgrind Ute Wappler Christof Fetzer Technische Universt¨ at Dresden Department of Computer Science Dresden, Germany {ute.wappler,christof.fetzer}@inf.tu-dresden.de Abstract To test software implemented hardware fault tolerance (SIHFT) mechanisms, injection of hardware faults is the appropriate instrument. Existing tools are mostly proces- sor dependent, difficult to use, and do not allow for a fine grained fault propagation analysis. FITgrind, which uses an artificial hardware architecture provided by Valgrind [6], alleviates these issues. 1 Motivation For testing fault tolerance mechanisms fault injection is a useful and accepted tool. Injection of transient, intermit- tent, and permanent hardware faults is required to analyze SIHFT whose objective is to detect such hardware faults. The most realistic but also most expensive hardware fault injection approaches are hardware-based and use, e.g., radi- ation or pin-level fault injection. Since these are very costly and might even destroy the hardware, many software-based solutions have been developed over time, which either use simulation of a hardware architecture or inject faults via changing the execution of a program either by directly mod- ifying hardware or software state. We have been developing a new fault injection tool FIT- grind which uses dynamic binary instrumentation provided by Valgrind [6]. The tool abstracts from the underlying hardware architecture and faults are injected into the arti- ficial architecture provided by Valgrind. The tool alleviates problems of existing tools, such as Xception [1] or FER- RARI [5]: • Processor independence which is neither provided by FERRARI nor Xception. Ferrari needs specification of the processor’s instruction set and Xception uses the pro- cessor’s debugging features. FITgrind can be used on all platforms supported by Valgrind which are currently x86, amd64, ppc32 and ppc64 running under Linux. • Monitoring & taint analysis: similar to [2] or [7] is real- izable using Valgrind’s API. This will enable fine granu- lar analysis of the propagation of faults and thus, a pre- cise evaluation of SIHFT mechanisms. To the best of our knowledge none of the existing injection tools supports fine-grained fault propagation analysis. At best, they log the context when a fault is injected, where which fault is injected and the outcome of the fault injection run. • Ease of use: no specification of a hardware model (FER- RARI), and no difficult testbed requiring more than one machine (Xception) are necessary. • Injection into binaries: faults can be injected into bina- ries without recompiling them to assess their fault toler- ance. Additionally, recompilation and a light-weight in- strumentation will give the user even more control over the fault injection process and the ability to do a fine- grained taint analysis. • Ease of implementation and extensibility: the first ver- sion of FITgrind which is able to inject transient bit flips on operands and results and transient or permanent re- placement of instructions, requires only about 1000 lines of C code and is easy to extend. For each new fault type injection one has to provide a method to select the in- jection points and a method to add the instrumentation code. The following fault types can in principle be injected by binary instrumentation: • Modification of operands and results to simulate bit flips or stuck-at faults in memory, registers or on buses. • Exchange of operands with other operands to simulate address line faults. • Replacement of instructions with other valid instructions or groups of instructions to simulate address line faults. • Faulty instructions execution to simulate bugs such as the famous Pentium FDiv bug [4]. Note that such a bug has to be mapped onto the hardware architecture simulated by Valgrind. • Modification of jump conditions and destinations to sim- ulate control flow errors. It would be also possible to simulate bit flips in instruc-