Adaptive Predictor Integration for System Performance Prediction Jian Zhang and Renato J. Figueiredo Advanced Computing and Information Systems (ACIS) Laboratory Department of Electrical and Computer Engineering University of Florida, Gainesville, FL 32611, USA {jianzh, renato}@acis.uﬂ.edu Abstract The integration of multiple predictors promises higher prediction accuracy than the accuracy that can be obtained with a single predictor. The challenge is how to select the best predictor at any given moment. Traditionally, multi- ple predictors are run in parallel and the one that generates the best result is selected for prediction. In this paper, we propose a novel approach for predictor integration based on the learning of historical predictions. It uses classiﬁca- tion algorithms such as k-Nearest Neighbor (k-NN) based supervised learning to forecast the best predictor for the workload under study. Then only the forecasted best pre- dictor is run for prediction. Our experimental results show that it achieved 20.18% higher best predictor forecasting accuracy than the cumulative MSE based predictor selec- tion approach used in the popular Network Weather Ser- vice system. In addition, it outperformed the observed most accurate single predictor in the pool for 44.23% of the per- formance traces. 1 Introduction Grid computing [11] enables entities to create a Vir- tual Organization (VO) to share their computation resources such as CPU time, memory, network bandwidth, and disk bandwidth. Predicting the dynamic resource availability is critical to adaptive resource scheduling. However, deter- mining the most appropriate resource prediction model a priori is difﬁcult due to the multi-dimensionality and vari- ability of system resource usage. First, the applications may exercise the use of different type of resources during their executions. Some resource usages such as CPU load may be relatively smoother whereas others such as network band- width are bustier. It is hard to ﬁnd a single prediction model 1-4244-0910-1/07/$20.00 c 2007 IEEE. which works best for all types of resources. Second, dif- ferent applications may have different resource usage pat- terns. The best prediction model for a speciﬁc resource of one machine may not wok best for another machine. Third, the resource performance ﬂuctuates dynamically due to the contention created by competing applications. Indeed, in the absence of a perfect prediction model, the best predictor for any particular resource may change over time. This paper introduces a Learning Aided Adaptive Re- source Predictor (LARPredictor), which can dynamically choose the best prediction model suited to the workload at any given moment. By integrating the prediction results generated by the best predictor of each moment during the application run, the LARPredictor can outperform any sin- gle predictor in the pool. It differs from the traditional mix- of-expert resource prediction approach in that it does not require running multiple prediction models in parallel all the time to identify the best predictors. Instead, the Princi- pal Component Analysis (PCA) and classiﬁcation algorithm such as k-Nearest Neighbor (k-NN) are used to forecast the best prediction model from a pool based on the monitoring and learning of the historical resource availability and the corresponding prediction performance. The LARPredictor is inspired by the VMPlant [19] project, which provides automated cloning and conﬁgura- tion of Virtual Machines (VMs). The virtual machines are highly conﬁgurable in terms of hardware and software. It is possible to adapt the machine conﬁgurations to the chang- ing workload to exploit better resource allocation. The learning aided adaptive resource performance prediction can be used to support dynamic VM provisioning by pro- viding accurate prediction of the resource availability of the host server and the resource demand of the applications that are reﬂected by the hosting virtual machines. Our experimental results based on the analysis of a set of virtual machine trace data show: 1. The best prediction model is workload speciﬁc. In the absence of a perfect prediction model, it is hard to ﬁnd a single predictor which works best across virtual machines