ORIGINAL ARTICLE Task scheduling with ANN-based temperature prediction in a data center: a simulation-based study Lizhe Wang • Gregor von Laszewski • Fang Huang • Jai Dayal • Tom Frulani • Geoffrey Fox Received: 30 September 2010 / Accepted: 13 December 2010 / Published online: 18 February 2011 Ó Springer-Verlag London Limited 2011 Abstract High temperatures within a data center can cause a number of problems, such as increased cooling costs and increased hardware failure rates. To overcome this problem, researchers have shown that workload man- agement, focused on a data center’s thermal properties, effectively reduces temperatures within a data center. In this paper, we propose a method to predict a workload’s thermal effect on a data center, which will be suitable for real-time scenarios. We use machine learning techniques, such as artificial neural networks (ANN) as our prediction methodology. We use real data taken from a data center’s normal operation to conduct our experiments. To reduce the data’s complexity, we introduce a thermal impact matrix to capture the spacial relationship between the data center’s heat sources, such as the compute nodes. Our results show that machine learning techniques can predict the workload’s thermal effects in a timely manner, thus making them well suited for real-time scenarios. Based on the temperature prediction techniques, we developed a thermal-aware workload scheduling algorithm for data centers, which aims to reduce power consumption and temperatures in a data center. A simulation study is carried out to evaluate the performance of the algorithm. Simula- tion results show that our algorithm can significantly reduce temperatures in data centers by introducing an endurable decline in performance. Keywords Data center Green computing Workload scheduling 1 Introduction A data center is a facility which houses a number of computing systems such as high-performance clusters, telecommunications, and storage systems. Nowadays, data centers play a key role in the modern IT infrastructure. Power usage is the most expensive portion of a data cen- ter’s operational costs. Recently, the U.S. Environmental Protection Agency (EPA) reported that 61 billion KWh, 1.5% of US electricity consumption, is used for data center computing [1]. Additionally, the energy consumption in data centers doubled between 2000 and 2006. Continuing this trend, the EPA estimates that the energy usage will double again by 2011. It is reported that the power and cooling cost is the most significant cost in data centers [2]. It is reported that this cooling costs can be up to 50% of the total energy cost [3]. Even with more efficient cooling technologies, such as those used in IBM’s BlueGene/L and TACC’s Ranger, one of the clusters at the Texas Advanced Computing Center, cooling cost still remains a significant portion of the total energy cost for these data centers. It is also noted that the reliability of a computer system’s hardware is directly related to its operating temperature. L. Wang G. von Laszewski (&) G. Fox Pervasive Technology Institute, Indiana University, 2719 E. 10th St., Bloomington, IN 47408, USA e-mail: laszewski@gmail.com F. Huang Institute of Geo-Spatial Information Technology, College of Automation, University of Electronic Science and Technology of China, Chengdu 611731, People’s Republic of China J. Dayal College of Computing, George Institute of Technology, Atlanta, USA T. Frulani Center for Computational Research, NYS Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, SUNY, 701 Ellicott St., Buffalo, NY 14203, USA 123 Engineering with Computers (2011) 27:381–391 DOI 10.1007/s00366-011-0211-4