When is the Right Time to Inject an Error? Andr´ eas Johansson, Constantin Sˆ arbu and Neeraj Suri Department of Computer Science, Technische Universit¨ at Darmstadt {aja,cs,suri}@informatik.tu-darmstadt.de 1. Introduction As software (SW) is progressing to play a crucial role in computer systems, its robustness will have a major impact on the system’s overall robustness. Being able to assess and improve SW robustness has been identified as an important field of research. In our research we target robustness eval- uation and enhancement of a key component in SW based systems, namely the OS. Many techniques have been de- veloped for this purpose and this paper deals with one of them, namely Fault Injection (FI). In FI the state of a sys- tem (memory, registers etc.) is intentionally corrupted and the behavior of the system is observed. Common types of FI include corruptions of memory or control path, corruption of function calls and their parameters. When conducting FI experiments there are many param- eters that must be considered properly, such as the error model, the type of injection techniques and the measures that are to be taken. In this paper we focus on another important aspect of FI, namely the instance of injection 1 . The injection instance clearly plays a major role in the out- come of an experiment. As the state of the system changes over time, different injection instances might lead to in- jection into different system states. Therefore efforts have been spent on trying to minimize the difference between the states of different injection rounds. The most common approach is to restart the target cleanly before each experi- ment, e.g., for OSs typically by performing a reboot. Addi- tionally, errors are often injected on first occurrence, i.e., on the first call to a certain function or instruction in the code [1, 2, 3, 5]. This is a reasonable approach to take, as it makes experiments more predictable and repeatable. However, the set of states that can be targeted is only a subset of the to- tal state space, and some states might never be targeted. To get confidence in the experiments, one would like the FI ex- periment to cover as large part of the system state space as possible. Therefore, we believe that only considering first occurrence is insufficient for achieving high levels of confi- dence in the results of the experiments. Consequently, this 1 This work has been supported, in part, by EC FP6 IP DECOS, NoE ReSIST and also by Microsoft Research paper presents an idea where the state of the system, de- fined as the current ”task” for the system, is used to define the time of injection. We have developed the idea further in the context of device drivers, where the current ”task” is more easily defined. We have scoped out two major activ- ities: first we will investigate how well current techniques ”cover” the state of a driver, and secondly we will define new FI experiments based on the state-driven approach and compare the results with normal, first occurrence results. 2. System Model The system model used in this paper is that of a simple four-layered system, with HW, drivers, OS and applications, depicted in Figure 1. Drivers interact with the rest of the OS by receiving requests for services, using a predefined inter- face. Drivers are considered as independent components processing requests on behalf of the OS. Drivers may also use services provided by the OS to fulfill these requests. Furthermore we do not assume source code access to nei- ther driver nor OS. Applications Operating System Device Drivers Hardware Figure 1. System model 2.1. Driver Modes To avoid confusion with state machine testing we refer to the ”state” of a driver as its mode. The mode of a driver is defined by the services it is currently serving [4]. A ser- vice in this context can be a function or a request message. Linux, for instance, uses function calls, while Windows XP