The Design and Analysis of Thermal-Resilient Hard-Real-Time Systems Pradeep M. Hettiarachchi 1 , Nathan Fisher 1 , Masud Ahmed 1 , Le Yi Wang 2 , Shinan Wang 1 , and Weisong Shi 1 1 Department of Computer Science 2 Department of Electrical and Computer Engineering Wayne State University {pradeepmh, ﬁshern, masud, lywang, shinan, weisong}@wayne.edu Abstract—We address the challenge of designing predictable real-time systems in an unpredictable thermal environment where environmental temperature may dynamically change (e.g., im- plantable medical devices). Towards this challenge, we propose a control-theoretic design methodology which permits a system de- signer to specify a set of hard-real-time performance modes under which the system may operate. The system automatically adjusts the real-time performance mode based on the external thermal stress. We show (via analysis, simulations, and a hardware testbed implementation) that our control-design framework is stable and control performance is equivalent to previous real-time thermal approaches, even under dynamic temperature changes. A crucial and novel advantage of our framework over previous real-time control is the ability to guarantee hard deadlines even under transitions between modes. Furthermore, our system design permits the calculation of a new metric called thermal resiliency which characterizes the maximum external thermal stress that any hard-real-time performance mode can withstand. Thus, our design framework and analysis may be classiﬁed as a thermal stress analysis for real-time systems. Index Terms—thermal resiliency; multi-mode system; thermal- aware system; thermal-aware periodic resource; I. I NTRODUCTION Modern computer-controlled systems are often deployed in dynamic and unpredictable thermal operating environments. From the hardware-design perspective, material scientists and computer engineers use rigorous thermal-stress analysis tech- niques (e.g., see [1]) to determine how the underlying physical hardware will withstand applied internal and external thermo- dynamic forces. Unfortunately, equivalent analysis does not exist for determining the effects of (unpredictable) thermal stress on the performance of the systems software. While hard- ware capabilities such as dynamic power management (DPM) permit a computing system to reduce its power dissipation at run-time, many embedded systems have real-time constraints which may be adversely affected by unexpected changes in processor speed. As an example of an embedded system where thermal- stress analysis is essential, consider microprocessors found in implantable medical devices (IMDs). IMDs are increasingly being used to treat various diseases and medical conditions (e.g., pacemakers for heart disease or neural implants to restore hearing/vision). However, recent studies [2], [3] have shown that the heat dissipated from IMDs due to the microprocessor activity is non-negligible. Thus, designing IMDs with mini- mum thermal dissipation is critical as medical research has shown that a temperature increase of even 1 ◦ C can have long-term effect on tissue [4] and, in the extreme, death may even result from excessive tissue heating [5]. Complicating the safe thermal design of IMDs, body temperature naturally ﬂuctuates over time and varies depending on location [6]. An IMD designer must balance (under temperature ﬂuctuations) the real-time computational requirements of the device with the non-harmful thermal operating limits. In the presence of an increased surrounding temperature, an IMD will have to reduce its computational load to prevent tissue damage due to heat 1 . However, as the correct and safe functioning of the IMD is an absolute requirement, the system designer requires techniques to formally verify the effect of different body temperatures on the correct operation of the IMD. Similarly, as a less safety-critical example, consider how the quality of audio/video decoding may degrade in a hand-held device as the system reacts to increases in temperature by reducing computational processing (e.g., via instruction fetch toggling). Ideally, a system designer would like to determine how much the performance will degrade under different thermal operating conditions. Unfortunately, no current formal real-time design and anal- ysis framework fully addresses the above setting. Recently- proposed control-theoretic frameworks exist for regulating processor temperature for soft -real-time systems (i.e., systems where jobs are permitted to “occasionally” miss computational deadlines) in an unpredictable thermal environment [8], [9]. While their results successfully show that it is possible to ob- tain stable and responsive thermal behavior and system utiliza- tion control, a system designer cannot use their approaches to a priori determine the amount of system-performance degra- dation due to changes in the thermal environment. Instead, the level of degradation can only be indirectly inferred via simulations of the system for different operating conditions. Furthermore, hard timing guarantees cannot be made in these frameworks. Techniques also already exist for permitting a 1 As IMD microprocessors typically do not have DVS capabilities, an IMD may have to reduce non-essential tasks such as communication with other nodes in a body-area network [7].