Firestorm: Operating Systems for Power-Constrained Architectures Sankaralingam Panneerselvam and Michael M. Swift Computer Sciences Department, University of Wisconsin–Madison {sankarp, swift}@cs.wisc.edu Abstract The phenomenon of Dark Silicon has made processors over- provisioned with compute units that cannot be used at full performance without exceeding power limits. Such limits primarily exist to exercise control over heat dissipation. Cur- rent systems support mechanisms to ensure system-wide guarantees of staying within the power and thermal limit. However, these mechanisms are not sufficient to provide process-level control to ensure applications level SLAs: power may be wasted on low-priority applications while high-priority ones are throttled. We built Firestorm, an operating system extension that introduces power and thermal awareness. Firestorm consid- ers power a limited resource and distributes it to applications based on their importance. To control temperature, Firestorm also introduces the notion of thermal capacity as another resource that the OS manages. These abstractions, imple- mented in Firestorm with mechanisms and policies to dis- tribute power and limit heat production, help applications to achieve guaranteed performance and stay within the sys- tem limits. In experiments, we show that Firestorm improved performance by up to 10% by avoiding thermal interference and can guarantee SLAs for soft real time applications in the presence of limited power and competing applications. 1. Introduction Moore’s law paved the way for doubling the transistors in the same chip area by reducing transistor sizes with every generation while also scaling voltage down. However, with the end of Dennard’s scaling, voltage and hence the power draw of transistors is no longer dropping proportionally to size. As a result, modern processors cannot use all parts of the processor simultaneously without exceeding the power limit. This manifests as an increasing proportion of dark silicon [7]. In other words, the compute capacity of current and future processors is and will be over-provisioned with respect to the available power. Power limits are influenced by different factors such as the capacity of power distribution infrastructure, battery sup- ply limits, and the thermal capacity of the system. Power limits in datacenters can arise from underprovisioning power distribution units relative to peak power draw. Energy limits are also dictated by the limited capacity of batteries. How- ever, in many systems, the primary limit comes not from the ability to acquire power, but instead from the ability to dis- sipate power as heat once it has been used. Thermal limits are dictated by the physical properties of the processor materials and also comfort of the user— people do not want their legs scorched when sitting with a laptop. Thus, power is limited to prevent processor chips from over-heating, which can lead to thermal breakdown. As a result, the maximum performance of a system is lim- ited by its cooling capacity, which determines its ability to dissipate heat. Cooling capacity varies across the comput- ing landscape, from servers with external chilled air to desk- tops with large fans to laptops to fan-less mobile devices. Furthermore, cooling capacity can change dynamically with software-controlled fans [32] or physically reconfigurable systems, such as dockable tablets [33]. Processors support mechanisms to enforce both power and temperature limits. For example, recent Intel proces- sors provide Running Average Power Limit (RAPL) coun- ters to enforce a power limit on the entire processor [27]. In software, power capping services, such as the Linux power capping framework [25] uses these limits to control power usage. Processor vendors define a metric Thermal Design Power (TDP) for every processor model to guide the re- quirements of the cooling system needed to dissipate power. Most processors have a safeguard mechanism that throttles the processor by reducing the frequency or duty cycle (frac- tion of cycles where work happens) on reaching a critical temperature. In software, the thermal daemon [34] aims to increase performance by deploying increasing cooling (e.g., increasing fan speed) if possible before resorting to throt- tling. Challenges. The drawback with current hardware and soft- ware mechanisms that enforce power and thermal limits are that they only offer system-wide guarantees but do not en- able application-level guarantees. Power distribution: When power is limited, current systems (hardware and software) throttle all applications equally. However, this approach ignores users’ scheduling priorities: