A Machine Learning-based Approach to Live Migration Modeling Changyeon Jo, Changmin Ahn, and Bernhard Egger Department of Computer Science and Engineering Seoul National University Seoul, Korea Email: {changyeon,changmin,bernhard}@csap.snu.ac.kr Abstract—Live migration is one of the core technologies to increase the efficiency of data centers by enabling better power savings, a higher utilization, load balancing, and simplifying maintenance. With service-level agreements (SLA) in place, the overhead of live migration in terms of resources consumed on the host plus the performance reduction and downtime of the migrated VM poses a major obstacle to effectively apply live migration. With various live migration algorithms available, an important question is then which of the algorithms can provide optimal performance while respecting the SLAs. In this work, we propose a versatile model that is able to accurately predict the key metrics of live migration. The machine-learned model is trained with data from over 10,000 VM migrations and evaluated for the five live migration algorithms available in the latest QEMU/KVM virtualization environment. The evaluation shows that the proposed model is able to predict the total migration time and the total transferred data with over 90% accuracy, and 90th percentile error of the downtime is 280ms. 1. Introduction Virtualization allows data center operators to better uti- lize their resources by running multiple virtual machines on one physical host. In order to adapt to fluctuating workloads in virtual machines and optimize the utilization of hard- ware resources, virtual machines can be live migrated [3], i.e., moved from one physical host to another while the virtual machine (VM) keeps running. To balance the load between servers, VMs running on over-committed hosts can be migrated to idle servers. On the other hand, the VMs of a lightly-loaded server can be consolidated onto another machine, and the now idle server can be turned off, thereby increasing the power efficiency of the data center. Migrating a VM requires copying its volatile state from the source to a destination host. The simplest approach, stop-and-copy, stops the VM on the source, completely transfers the VM’s state, and finally resumes the VM on the destination host. In the presence of service-level agreements (SLA) between the data center operator and the owner of the VM requiring a certain availability of service, the stop-and- copy approach is unfeasible due to its long period during which the VM remains unavailable. This downtime is not the only key metric of live mi- gration. Other important factors include the total migration time, the total amount of data transferred, and the perfor- mance degradation of the VM being migrated. Additionally, the amount of CPU and memory resources and the network bandwidth required by the migration may also be of interest, especially in a resource-constrained environment. Over the past decade, a number of live migration tech- niques have been proposed [3], [4], [5], [7], [10], each of which aims at optimizing one or several of the above metrics. The proposed techniques range from copying the volatile state iteratively while the VM keeps running on the source host to moving the core of the VM immediately and fetch outstanding data on demand. A number of orthogonal optimizations such as data compression or CPU throttling have been proposed as well. The different techniques exhibit distinct characteristics in the key metrics for identical work- loads. In addition, for a given technique, its performance shows a large variance subject to the workload running inside the VM and on the host. In order to apply live migration effectively, an important problem for data center operators is thus to select the best migration technique as a function of SLAs, the operator’s optimization policy, plus the workload characteristics of the VM and the host. In this work, we present a method to build accurate performance estimation models for live migration. Based on an analysis of a large dataset of live migration profiles of diverse workloads migrated with different live migration techniques under varying resource constraints, we employ Machine Learning techniques to automatically generate a performance prediction model. The model can estimate the total migration time, the total amount of data transferred, and the VM downtime for the different migration techniques. By virtue of the automatic approach, new migration algorithms and profile features can be easily added, rendering the proposed procedure flexible and extensible. We verify the feasibility of the proposed approach in the QEMU/KVM vir- tualization environment [6]. Based on over 10’000 migration profiles, the generated model predicts the total migration time, the total amount of transferred data, and the downtime with high accuracy. In ongoing research, we employ the model in a data center optimization framework to select the best algorithm for a given situation and constraints. The remainder of this paper is organized as follows: Section 2 provides the necessary background on live migra- tion and discusses the different live migration techniques. Section 3 details the data collection and model building process, and Section 4 evaluates the models. Section 5, finally, concludes the paper and discusses future work. Changyeon Jo, Changmin Ahn, and Bernhard Egger. "A Machine Learning-based Approach to Live Migration Modeling." Presented at the 4th International Workshop on Efficient Data Center Systems (EDCS'16), Seoul, Korea, June 2016. http://csap.snu.ac.kr/publications