DRL-Cloud: Deep Reinforcement Learning-Based Resource Provisioning and Task Scheduling for Cloud Service Providers Mingxi Cheng ∗ , Ji Li † , and Shahin Nazarian † ∗ Duke University, Durham, NC, USA (mingxi.cheng@duke.edu) † University of Southern California, Los Angeles, CA, USA ({jli724, shahin.nazarian}@usc.edu) Abstract—Cloud computing has become an attractive comput- ing paradigm in both academia and industry. Through virtu- alization technology, Cloud Service Providers (CSPs) that own data centers can structure physical servers into Virtual Machines (VMs) to provide services, resources, and infrastructures to users. Profit-driven CSPs charge users for service access and VM rental, and reduce power consumption and electric bills so as to increase profit margin. The key challenge faced by CSPs is data center energy cost minimization. Prior works proposed various algorithms to reduce energy cost through Resource Provisioning (RP) and/or Task Scheduling (TS). However, they have scalability issues or do not consider TS with task dependencies, which is a crucial factor that ensures correct parallel execution of tasks. This paper presents DRL-Cloud, a novel Deep Reinforcement Learning (DRL)-based RP and TS system, to minimize energy cost for large-scale CSPs with very large number of servers that receive enormous numbers of user requests per day. A deep Q-learning-based two-stage RP-TS processor is designed to automatically generate the best long-term decisions by learning from the changing environment such as user request patterns and realistic electric price. With training techniques such as target network, experience replay, and exploration and exploitation, the proposed DRL-Cloud achieves remarkably high energy cost efficiency, low reject rate as well as low runtime with fast convergence. Compared with one of the state-of-the-art energy efficient algorithms, the proposed DRL-Cloud achieves up to 320% energy cost efficiency improvement while maintaining lower reject rate on average. For an example CSP setup with 5, 000 servers and 200, 000 tasks, compared to a fast round- robin baseline, the proposed DRL-Cloud achieves up to 144% runtime reduction. Keywords—Deep reinforcement learning, deep Q-learning, re- source provisioning, task scheduling, cloud resource management. I. I NTRODUCTION Cloud computing has emerged as a cogent and powerful paradigm that delivers omnipresent and on-demand access to a shared pool of configurable computing resources as a service through the Internet [1]. Virtualization is the fundamental tech- nology of cloud computing, which enables multiple operating systems to run on the same physical platform, and structures servers into Virtual Machines (VMs) [2]. VMs are used by Cloud Service Providers (CSPs) to provide infrastructures, platforms, and resources (e.g., CPU, memory, storage, etc.). In the cloud computing paradigm, CSPs are incentivized by the benefit of charging users for cloud service access, resource utilization and VM rental, whereas users are attracted by the opportunity of eliminating expenditure of implementing computational, time and power consuming applications on cloud based on their own requirements [3]. Despite the success of many well-known CSPs such as Google App Engine (GAE) and Amazon Elastic Compute Cloud (EC2), the tremendous energy costs in terms of elec- tricity consumed by data centers is a serious challenge. Data center electricity consumption is projected to be roughly 140 billion kilowatt-hours annually by 2020, which costs 13 billion US dollars annually in electric bills [4]. Hence, in order to increase the profit margin and as well, reduce the carbon footprint for sustainable development and abstemious economical society, it is imperative to minimize the data center electricity consumption for large-scale CSPs. According to [5], energy usage of data centers has two im- portant features: (i) servers tend to be more energy inefficient under low utilization rate (with the optimal power efficient utilization rate of most servers ranging between 70% and 80%), and (ii) servers may consume a considerable amount of power in idle mode. Therefore, server consolidation and load balancing can be applied to improve the overall energy efficiency through selectively shutting down idle servers and improving the utilization levels in active servers. Meanwhile, the agreements in the Service-Level Agreement (SLA) should be consistently met, which is negotiated by the CSP and users regarding privacy, security, availability, and compensation [6]. Energy consumption and electric cost reduction become challenging for CSPs, and the reasons are twofold: First, scal- ability of expenditure control is critical due to the large-scale server farms and enormous numbers of incoming requests per day, and both of which are still growing. Second, as user request patterns can change both in short-term (within a day) and long-term (from month/year to month/year), the adaptabil- ity and self-learning capacity of the energy and electric cost reduction method are required. Many approaches have been proposed in the literature to improve energy efficiency of data centers owned by CSPs through Resource Provisioning (RP) and/or Task Scheduling (TS) [3], [7–9]. Nevertheless, these prior works [3], [7– 9] have scalability issues and their offline algorithms have difficulties in dealing with the large size of inputs and adapt to changes, e.g., dealing with different user request patterns. The recently proposed Deep Reinforcement Learning (DRL) technique, which has been shown successful in playing Atari and Go games [10–12], has outstanding ability in handling complicated control problems with high-dimensional state spaces and low-dimensional action spaces by utilizing deep neural networks [13–17]. Inspired by this, N. Liu et al. applied DRL to (partially) solve the resource allocating problem in cloud computing [18], without detailed scheduling for tasks with data dependencies, which is critical to guarantee tasks are executed correctly [19]. To comprehensively solve the energy cost reduction prob- lem, we propose the DRL-Cloud framework, which is the first DRL-based highly scalable and adaptable RP and TS system with capability to handle large-scale data centers and changing user requests. In this paper, a general type of realistic pricing policy comprised of time-of-use pricing (TOUP) and real-time pricing (RTP) [20], [21] is used. In addition, Pay- As-You-Go billing agreement (as in GAE and EC2) is used. All deadlines are hard deadlines and a task will be rejected if the hard deadline is violated. DRL-Cloud is comprised of two major parts: i) user request acceptance and decoupling into a job queue and a task ready queue; ii) energy cost minimization by our DRL-based two-stage RP-TS processor, and fast convergence is guaranteed by training techniques in