2168-7161 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2017.2780159, IEEE Transactions on Cloud Computing 1 Amazon EC2 Spot Price Prediction using Regression Random Forests Veena Khandelwal, Anand Kishore Chaturvedi, Chandra Prakash Gupta Abstract— Spot instances were introduced by Amazon EC2 in December 2009 to sell its spare capacity through auction based market mechanism. Despite its extremely low prices, cloud spot market has low utilization. Spot pricing being dynamic, spot instances are prone to out-of bid failure. Bidding complexity is another reason why users today still fear using spot instances. This work aims to present Regression Random Forests (RRFs) model to predict one-week-ahead and one-day-ahead spot prices. The prediction would assist cloud users to plan in advance when to acquire spot instances, estimate execution costs, and also assist them in bid decision making to minimize execution costs and out-of-bid failure probability. Simulations with 12 months real Amazon EC2 spot history traces to forecast future spot prices show the effectiveness of the proposed technique. Comparison of RRFs based spot price forecasts with existing non-parametric machine learning models reveal that RRFs based forecast accuracy outperforms other models. We measure predictive accuracy using MAPE, MCPE, OOBError and speed. Evaluation results show that <= 10% for 66 to 92% and <= 15% for 35 to 81% of one-day-ahead predictions with prediction time less than one second. <= 15% for 71 to 96% of one-week-ahead predictions. Index Terms— Amazon EC2, Compute instances, One-day-ahead prediction, One-week-ahead prediction, Regression Random Forests, Spot instances, Spot price prediction —————————— —————————— 1. INTRODUCTION AND MOTIVATION HE on-demand scalability characteristic of cloud computing forces cloud service providers to over- estimate their resources to meet peak load demand of its customers which happens at different time periods and may not overlap. Due to over-estimation, a large number of cloud resources are idle during off peak hours. Cloud providers also face the problem of allocating resources, keeping in view user‟s different job requirements and data center capacity. Different types of users, multiple types of requirements further alleviate the resource allo- cation problem. Also, demand for cloud resources fluc- tuate due to today‟s usage based pricing plans. In order to manage these demand fluctuations more flexible pricing plans are required to sell resources according to real time market demand. Spot pricing was introduced by Amazon EC2 in December 2009 to minimize operational cost, combat under utilization of its resources and make more profit. Similar to on-demand instances, spot instances offer several instance types comprising different combina- tions of CPU, memory, storage and networking capacity. Amazon Web Service (AWS) is not the only participant in the spot instance realm. Google Compute Engine launched its preemptible Virtual Machines on September 8, 2015 designed for such type of workloads that can be delayed and are fault tolerant at the same time. Users can bid for spot instances (SIs) where prices are charged at lowest bid price, whereas, pricing on Google Preemptible VMs is fixed at per hour rate. The distinguishing feature of Amazon Elastic Compute Cloud (EC2) spot instance is its dynamic pricing. From customer's perspective, spot instances offer pros- pects of low cost utility computing at a risk of out-of-bid failure at any time by Amazon EC2. Spot instance reliabil- ity depends on the market price and user's maximum bid (limited by their hourly budget). Spot prices vary dynamically with real-time based on demand (user's bid) and supply (resource availability) for spot instance capacity in the data centers across the globe. User's bids for spot instances and control the balance of reliability versus monetary cost. The price for spot in- stances sometimes can be as low as one eighth of the price of on-demand instances. On the other hand, it is also not uncommon that spot prices surpass on-demand prices in cloud data centers. When the demand is low, spot prices are low because less numbers of users are bidding for the same instance. Therefore, a bidder‟s probability of incur- ring less monetary cost is higher. On the other hand, when the demand is high, users are willing to pay high to get access and hence spot prices increase. Spot pricing in particular is a pricing model targeted for divisible computing jobs that can shift the time of processing to when the computing resources are available at low cost [1]. The primary requirement is that the appli- cations must be time flexible, do not have a steep comple- tion deadline and should be interrupt tolerant. Spot in- stances are also required for executing certain sudden tasks which do not need reserved instances. The ability to predict spot price lends itself to a variety ———————————————— T Veena Khandelwal is a PhD student in Department of Computer Science & Engineering at Rajasthan Technical University, Kota, India. Email: vn.khandelwal@gmail.com Dr. A. K. Chaturvedi is presently working as Principal, MLV Textile and Engineering College, Bhilwara affiliated to Rajasthan Technical University, Kota. E-mail: chaturvedi101@gmail.com Dr. C.P. Gupta is Professor in Department of Computer Science & Engineering at Rajasthan Technical University, Kota. E-mail: gup- tacp2@rediffmail.com