The International Journal of Time-Critical Computing Systems, 20, 51–81, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Dynamic Scheduling and Fault-Tolerance: Specification and Verification TOMASZ JANOWSKI tj@iist.unu.edu The United Nations University, International Institute for Software Technology, P.O. Box 3058, Macau MATHAI JOSEPH mathai@pune.tcs.co.in Tata Research Development and Design Centre, Pune, India Abstract. Consider a distributed real-time program which is executed on a system with a limited set of hardware resources. Assume the program is required to satisfy some timing constraints, despite the occurrence of anticipated hardware failures. For efficient use of resources, scheduling decisions must be taken at run-time, considering deadlines, the load and hardware failures. The paper demonstrates how to reason about such dynamically scheduled programs in the framework of a timed process algebra and modal logic. The algebra provides a uniform process encoding of programs, hardware and schedulers, with an operational semantics of a process depending on the assumptions about faults. The logic specifies the timing properties of a process and verifies them via this fault- affected semantics, establishing fault-tolerance. The approach lends itself to application of existing tools and results supporting reasoning in process algebras and modal logics. Keywords: real-time distributed systems, provable fault-tolerance, provable schedulability, timed process algebra, timed modal logic 1. Introduction Consider a distributed real-time program which consists of a fixed number of tasks, each with a possibly unbounded sequence of invocations. Periodic tasks are invoked at regu- lar intervals by timers and sporadic tasks are invoked by the environment or some other task. Tasks are statically partitioned between distributed nodes, each with its own local memory and a clock, and all connected by a network. Clocks are used to implement timers and tasks can communicate by sharing memory, if located at the same node, or otherwise by message-passing. The hardware on which the program executes is poten- tially unreliable, e.g. processors may fail, memory may be corrupted and communications may be delayed. The program is required to meet deadlines despite the occurrence of such anticipated faults, ensuring for example that a task produces correct output within a specified time after receiving input, or relating the timing of actions in one or more, possibly remote, tasks. We consider in this paper the problem of verifying such dead- lines. We carry out verification by model-checking. Verification takes into account resource limitations, by modelling syntactically part of the program’s execution environment. Also unpredictability of faults, by making sure that fault-tolerance for a set of anticipated faults implies fault-tolerance for any subset of those faults (in particular for an empty set of faults). Model-checking takes the form P, F |= D. P is the program we want to verify, de- scribed, with part of its execution environment, as a process in the timed process algebra