IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 9, SEPTEMBER 2011 1359 Temperature Aware Dynamic Workload Scheduling in Multisocket CPU Servers Raid Ayoub, Student Member, IEEE, Krishnam Indukuri, Member, IEEE, and Tajana Simunic Rosing, Member, IEEE Abstract —In this paper, we propose a multitier approach for significantly lowering the cooling costs associated with fan subsystems without compromising the system performance. Our technique manages the fan speed by intelligently allocating the workload at the core level as well as at the CPU socket level. At the core level we propose a proactive dynamic thermal man- agement scheme. We introduce a new predictor that utilizes the band-limited property of the temperature frequency spectrum. A big advantage of our predictor is that it does not require the costly training phase and still maintains high accuracy. At the socket level, we use control theoretic approach to develop a stable scheduler that reduces the cooling costs further by providing a better thermal distribution. Our thermal management scheme incorporates runtime workload characterization to perform effi- cient thermally aware scheduling. The experimental results show that our approach delivers an average cooling energy savings of 80% compared to the state of the art techniques. The reported results also show that our formal technique maintains stability while heuristic solutions fail in this aspect. Index Terms—Fan control, multiple cores, multiple CPU sock- ets, multitier thermal management, state-space control, temper- ature prediction. I. Introduction M ODERN servers are commonly equipped with multiple CPU sockets to cope with the increasing demand of computationally intensive applications [1], [2]. To further in- crease the computational power, an additional layer of parallel processing is implemented within the CPU socket where each socket is a chip multiprocessors (CMP, e.g., Intel Xeon quad core processor). However, this complex level of integration coupled with high performance of the processors leads to higher power densities [3]. The high power density causes thermal hot-spots in the system that have substantial effect on reliability and leakage power [4]. It also degrades performance since interconnect delays increase with temperature [5]. Dis- sipating the excess heat is one of the biggest challenges as it requires a complex and energy hungry cooling subsystem. The cooling subsystems in high-end servers are designed based on the concept of forced convection where controlling Manuscript received January 25, 2011; accepted March 21, 2011. Date of current version August 19, 2011. This work was supported by the National Science Foundation (NSF) Project GreenLight, under Grant 0821155, the NSF SHF, under Grant 0916127, the NSF ERC CIAM, the NSF Variability, the NSF Flash Gorden, CNS, the NSF IRNC, Translight/Starlight, Oracle, Google, Microsoft, MuSyC, UC Micro, under Grant 08-039, and Cisco. This paper was recommended by Associate Editor Y. Xie. The authors are with the Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093 USA (e-mail: rayoub@cs.ucd.edu; kindukur@ucsd.edu; tajana@ucsd.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2011.2153852 the rate of air flow improves the heat transfer between the heat sink and the ambient to meet the desired value. The air flow is normally generated using a fan subsystem that focuses the air toward the CPU heat sink. To control the fan speed, closed loop controllers are normally used where CPU thermal sensors provide feedback signal to the controller. Fan-based cooling subsystem increase the air-flow rate to match to a corresponding rise in temperature. However, the challenge is the substantial increase in cooling power due to the cubic relationship between fan speed and its power [6]. The fan system in high-end servers consumes as much as 80 W in 1U rack servers [6] and 240 W or more in 2U rack servers [1]. The results in [7] show that the fan power can reach up to 51% of the overall server power budget. Moreover, the increase in fan speed introduces large noise levels in the system. The acoustic noise levels increase by 10 dB as air- flow rate increases by 50% [8]. Such increase in noise level not only leads to uncomfortable working environment but also causes vibrations that may impact reliability. Minimizing rate of air flow to just what is needed for cooling is essential to deliver energy efficiency at minimal acoustic noise level. Current operating systems employ dynamic load balancing (DLB) to enhance the utilization of the system resources. The DLB performs thread migration to lower the difference in task queue lengths of the individual computational units [9]. Nev- ertheless, thermal hot spots my still occur in the CPUs since DLB does not consider temperature in allocating the workload. When the number of running threads is less than the number of physical cores, the DLB does not initiate any migrations since the workload is balanced from performance point of view. Such scenarios can result in hot-spots as a portion of the cores could be highly active while the others are idle. To manage the high temperature within a single CPU socket, a number of dynamic thermal management (DTM) techniques have been proposed. Reactive techniques, proposed in the literature, manage the temperature upon reaching a critical threshold. Employing these techniques come at a high price of performance overhead. A few predictive thermal migration techniques have been proposed recently that can predict the occurrence of thermal emergencies ahead of time [10]–[12]. Although temperature prediction is fairly accurate, a training phase is required that impacts the performance and prediction opportunities. Besides, prior techniques are limited to single socket CPU designs and cooling dynamics are not modeled with sufficient accuracy. We identify these as a major draw- backs that motivates us to propose a multitier proactive work- 0278-0070/$26.00 c 2011 IEEE