SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper., 29(4), 379–395 (1999) Fault-Tolerant RT-Mach (FT-RT-Mach) and an Application to Real-Time Train Control A. EGAN, D. KUTZ, D. MIKULIN, R. MELHEM AND D. MOSS ´ E ∗ Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260, USA (email: {melhem, mosse}@cs.pitt.edu) SUMMARY Even though real-time systems have the stringent constraint of completing tasks before their deadlines, many existing real-time operating systems do not implement fault tolerance capabilities. In this paper we summarize fault tolerant real-time scheduling policy for dynamic tasks with ready times and deadlines. Our focus in this paper is the implementation, which includes fault-tolerant scheduling, re-scheduling, and recovery mechanisms in the FT-RT-Mach operating system, a fault-tolerant version of RT-Mach. A real- time train control application is then implemented using the FT-RT-Mach operating system. Copyright 1999 John Wiley & Sons, Ltd. KEY WORDS: fault tolerance; operating system; real-time system; process control 1. INTRODUCTION In train control, correct execution of tasks is determined not only by logical correctness, but also by satisfying certain temporal constraints. For example, when a train is moving, real-time collision path algorithms as well as velocity and break control must be carried out within very stringent timing constraints. Tasks with temporal constraints are hard real-time when failure to produce the desired results on time may lead to catastrophic consequences (such as loss of life or high monetary loss in the case of a train accident). Many theoretical and practical solutions have been proposed for RT issues, both for embedded and non-embedded systems. One of them is the RT-Mach operating system, which focuses on solving the problem of executing real-time periodic threads in uniprocessor systems [1]. Several early real-time scheduling policies were implemented such as Rate-Monotonic Scheduling (RMS), and Earliest Deadline First (EDF) [2], while later implementations added support for continuous media (reserves [3], netphone [4], etc.). Such technology for dealing with real-time constraints has also been used in other commercial operating systems (e.g. QNX, VxWorks, and Mach-RT). All of the above are representatives of preemptive systems. For non-preemptive operating systems, Spring [5,6] and Maruti [7] use explicit timing constraints to schedule real-time tasks, which are then guaranteed to execute within their ∗ Correspondence to: D. Moss´ e, Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260, USA. Contract/grant sponsor: DARPA; Contract/grant number: DABT52-96-C-0044. CCC 0038–0644/99/040379–17$17.50 Received 6 May 1998 Copyright 1999 John Wiley & Sons, Ltd. Revised 17 August 1998, 24 November 1998 Accepted 24 November 1998