Fault-Tolerant Scheduling under Time and Resource Constraints LihChyun Shu Michal Young Dept. of Information Management Chang Jung University Tainan County, Taiwan 711, ROC Dept. of Computer and Information Science University of Oregon Eugene, OR 97403-1202 zyx 1 Introduction Ghosh et al. [GMM98] presented a novel approach for providing fault tolerance for sets of independent, periodic tasks with rate-monotonic scheduling. We extend this approach to tasks that share logical or physical resources (and hence require synchronization). We show that if the simple rate-monotonic dispatch is replaced by stack scheduling [BakSl], the worst-case blocking overhead of the stack resource policy and the worst-case retry overhead for fault tolerance are not additive, but rather only the maximum of the two over- heads is incurred. 1.1 Background We assume that faults are transient and fault detec- tion is done at the end of each task execution. To ease exposition, we consider a single-fault model in this pa- per. The analysis results presented in Section 2 can be extended to the case of multiple faults provided that faults occur at a time no closer than some distance, e.g., the sum of the periods of two lowest priority tasks as shown in [GMM98]. Whenever a fault occurs, the affected task is re-executed to recover from the fault. To ensure that the re-executed task can finish before its deadline, we must allocate sufficient slack within the schedule. As noted by Ghosh et al.[GMM98], adding time redundancy can be regarded as reserving proces- sor utilization for a backup task zyxwvutsrq B. The utilization of task B, UB, is a constant because the added slack can be thought of as evenly distributed throughout the schedule. Hence, for any time interval L, the amount of slack reserved during L is UB zyxwvutsr . L. To recover from a fault, Ghosh et al. proposed a re- covery scheme, termed RS, which governs how tasks are executed when a fault is detected and the sys- tem goes into the recovery mode. When the system This research was partially supported by the National Sci- ence Council under grant number NSC 89-2213-E-309-008. is in recovery mode, the recovering task zyx rT re-executes with its own priority. When a higher priority task TH arrives during recovery mode, TH will be delayed un- til recovery is complete unless TH’S deadline is earlier than rT’s. Ghosh et al. show that schedulability can be guaranteed for both normal and recovering tasks if the recovery scheme RS is used, U, = maxy=l(Uz), and U, 5 n . zyxw (21/n - 1)(1 - UB) where U, is the utilization of task r,, 1 5 zyx i 5 n, i.e., U, = C,/T,. We assume T, 5 T,+1, i = 1,. . . , n - 1. We assume that shared resources are protected by critical sections. Tasks gain access to critical sec- tions by locking semaphores and the stack resource pol- icy [BakSl] is used to control priority inversion. The stack resource policy defines the preemption ceiling of a semaphore s as the priority of the highest task that may lock s. The preemption rule does not allow a task r to preempt another task r~ unless its priority is higher than the highest preemption ceilings of all semaphores currently locked by r~. With the preemption rule, one can show that the stack resource policy possesses the following two important properties: (1) freedom from deadlock (2) each task can be blocked for at most the duration of one critical section of any lower priority task. We quote the following facts about the stack re- source policy due to Baker which were proved under the assumption of fault-free tasks. Lemma 1 [BakSl] A task I- can be blocked by one crit- ical section of a lower priority task TI>, only if TL has entered and remained within a critical section when r arrives. Lemma 2 [BakSl] A task r can be blocked by a lower priority task r~ due to the stack resource policy, only if the priority of task I- is no higher than the highest preemption ceiling of all the semaphores that are locked by task r~ when r arrives. 117 0-7695-1134-1/01$10.00 0 2001 IEEE