Analysis and Autonomic Elasticity Control for Multi-Server Queues Under Traffic Surges Venkat Tadakamalla Computer Science Department George Mason University Fairfax, VA, USA vtadakam@masonlive.gmu.edu Daniel A. Menasc´ e Computer Science Department George Mason University Fairfax, VA, USA menasce@gmu.edu Abstract—Many computing environments consist of a multi- tude of servers that process requests that arrive from a population of customers. Incoming requests that find all servers busy have to wait until a server becomes idle. This type of queuing system is known as a G/G/c system and has been extensively studied in the queuing literature under steady state conditions. In this paper we study multi-server systems that are subject to workload surges during which time the average arrival rate of requests exceeds the system’s capacity. This paper’s main contributions are (1) The derivation of a set of equations to estimate the impact of workload surges on response time; (2) A simulator for a G/G/c system to evaluate the accuracy of the equations in (1); and (3) The design, implementation, and extensive evaluation of an autonomic controller for multi-server elasticity that uses the equations derived in (1). The results show that our equations estimate with great accuracy the impact of surges on response time and that our autonomic controller is able to successfully determine how to vary the number of servers to mitigate the impact of workload surges. Index Terms—elasticity control; cloud computing; autonomic computing; queuing theory; G/G/c; workload surge; I. I NTRODUCTION Many computing environments consist of a multitude of servers that process requests that arrive from a population of customers (e.g., web sites with multiple web servers at the front tier). Each request is served by one server only. When all servers are busy serving requests, arriving requests have to wait in a waiting line until a server becomes available. This type of queuing system is known as a G/G/c system in Kendall’s notation [1]. In this notation, the first letter represents the type of distribution of the interarrival time of requests, the second letter indicates the distribution for the service time of requests, and c denotes the number of servers. The letter G stands for a generic distribution, while M (for Markovian or memoryless) stands for an exponential distribution, and D for a deterministic distribution (i.e., a constant value). There is an extensive literature on the study of analytical models of queueing systems in steady state, i.e., when the average arrival rate of requests is smaller than the maximum rate at which the system can perform work, i.e., the system capacity (see e.g., [1]–[3]). The ratio between the average arrival rate of requests and the system’s capacity is called traffic intensity and is typically denoted by ρ in the queuing literature. A queuing system is in steady-state when ρ< 1. For some queuing systems (e.g., M/G/1, M/M/c) there are exact steady state results while for others (e.g., G/G/1 and G/G/c) there are approximations and/or bounds. Nevertheless, these results apply only to systems in steady state. However, most actual systems are subject to workload surges (aka flash crowds), i.e, periods during which the arrival rate exceeds the system’s capacity (see e.g., [4]–[9]). When that happens, the queue length grows continuously and so does the response time of requests. It turns out that the response time continues to increase even after the surge is finished. In other words, the response time does not return to its steady state value as soon as the surge is over. As an illustration, consider Fig. 1 that shows a rectangular-shaped workload surge that lasts from t = 300 sec to t = 600 sec. The left axis shows the response time R and the right axis shows the average arrival rate. The traffic intensity shows a surge from a value of 5 to 20 requests/sec and lasts for 5 minutes. The response time curve (blue curve) shows the response time of transactions that leave the system at a given time instant. As we can see, even though the traffic intensity returned to its steady-state value of 0.5 at time 600 sec, the response time peak of 290 seconds was observed at time 880 sec and it only returned to its pre-surge level at time 1,260 sec. As illustrated above, workload surges generate very high response times that can be orders of magnitude higher than corresponding steady state values and can be very disruptive to users and damaging to organizations that provide comput- ing services. Fluid approximations to queuing theory have been suggested as a way to analyze the transient behavior of queues [10]. In that formulation, customers arrive as a continuous fluid with a time-varying arrival rate. The equations we derive here have a fluid approximation flavor but go beyond what has been proposed previously. Cloud providers, such as Infrastructure as a Service (IaaS), allow for resources in the form of virtual machines to be dynamically added or removed from the set of available resources to cope with traffic intensity variability so as to help ensure that response times stay within expected values. This is called elasticity (see e.g., Amazon Elastic Compute Cloud, EC2). Elasticity has been defined as the degree to which a system is able to adapt to workload changes by provisioning IEEE International Conference on Cloud and Autonomic Computing 978-1-5386-1939-1/17 $31.00 © 2017 IEEE DOI 10.1109/ICCAC.2017.16 95 IEEE International Conference on Cloud and Autonomic Computing 978-1-5386-1939-1/17 $31.00 © 2017 IEEE DOI 10.1109/ICCAC.2017.16 92