Received: 6 March 2018 Revised: 23 August 2018 Accepted: 23 August 2018 DOI: 10.1002/spe.2641 RESEARCH ARTICLE An ensemble CPU load prediction algorithm using a Bayesian information criterion and smooth filters in a cloud computing environment Sajjad Tofighy 1 Ali A. Rahmanian 2 Mostafa Ghobaei-Arani 3 1 Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran 2 Department of Computer Science & Engineering and Information Technology, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran 3 Department of Computer Engineering, Qom Islamic Azad University, Qom, Iran Correspondence Mostafa Ghobaei-Arani, Department of Computer Engineering, Qom Islamic Azad University, Qom, Iran. Email: m.ghobaei@qom-iau.ac.ir Summary Cloud resource management requires complex policies and decisions to ensure the suitable use of computing resources due to fluctuations in the demanding workload. Deciding the right amount of resource usage for performing user requests in cloud environments is not trivial. Therefore, an efficient resource prediction model can play important roles in cloud resource management to esti- mate the needed resources properly. In this paper, we propose an ensemble CPU load prediction model using a Bayesian information criterion to choose the best constituent model in each time slot based on the cloud resource usage history. Further, we apply a couple of smooth filters in order to decrease the negative impacts of outliers in the observed data points. We also present a framework for cloud resource management including a prediction module to estimate the resource usage more accurately. The experimental results on the data set of the CoMon project indicate that the proposed approach achieves higher accuracy compared with the other ensemble prediction algorithms. KEYWORDS Bayesian information criterion, cloud computing, CPU load prediction, ensemble model 1 INTRODUCTION Cloud computing is a model that enables the end users to access the shared pool of resources such as compute, net- work, storage, database, and application as an on-demand service based on a pay-as-you-go model through the Internet. 1 Generally, the three kinds of services with which the cloud-based computing resources are available to end customers are as follows: Software as a Service, Platform as a Service, and Infrastructure as a Service (IaaS). In this research, we focus on the IaaS services that change the computing from a physical infrastructure to a virtual infrastructure using virtualization technology. All the virtual resources are given to the virtual machines (VMs) that are configured by the service provider. The end users or IT architects will use the infrastructure resources in the form of VMs. The right amount of resource usage for performing user requests in cloud environments is not trivial. 2,3 Therefore, monitoring and predicting their resource usage, including CPU load, memory usage, and disk space, can be effective. The predicting accuracy of the required VM resources for a given user request is a challenging issue in cloud resource management scope due to under-provisioning and over-provisioning problems. In the over-provisioning problem, the predicted resources for a given user request are more than the actual loads of the user's demand, whereas in the under-provisioning problem, the predicted resources are less than the actual loads of the user's requirements. 4 Hence, in the over-provisioning problem, neither would the user be affected nor will the service provider suffer the waste of idle resources, whereas in the over-provisioning problem, Softw Pract Exper. 2018;1–21. wileyonlinelibrary.com/journal/spe © 2018 John Wiley & Sons, Ltd. 1