Analysis and Autonomic Elasticity Control for
Multi-Server Queues Under Traffic Surges
Venkat Tadakamalla
Computer Science Department
George Mason University
Fairfax, VA, USA
vtadakam@masonlive.gmu.edu
Daniel A. Menasc´ e
Computer Science Department
George Mason University
Fairfax, VA, USA
menasce@gmu.edu
Abstract—Many computing environments consist of a multi-
tude of servers that process requests that arrive from a population
of customers. Incoming requests that find all servers busy have
to wait until a server becomes idle. This type of queuing system
is known as a G/G/c system and has been extensively studied
in the queuing literature under steady state conditions. In this
paper we study multi-server systems that are subject to workload
surges during which time the average arrival rate of requests
exceeds the system’s capacity. This paper’s main contributions
are (1) The derivation of a set of equations to estimate the
impact of workload surges on response time; (2) A simulator
for a G/G/c system to evaluate the accuracy of the equations in
(1); and (3) The design, implementation, and extensive evaluation
of an autonomic controller for multi-server elasticity that uses
the equations derived in (1). The results show that our equations
estimate with great accuracy the impact of surges on response
time and that our autonomic controller is able to successfully
determine how to vary the number of servers to mitigate the
impact of workload surges.
Index Terms—elasticity control; cloud computing; autonomic
computing; queuing theory; G/G/c; workload surge;
I. I NTRODUCTION
Many computing environments consist of a multitude of
servers that process requests that arrive from a population of
customers (e.g., web sites with multiple web servers at the
front tier). Each request is served by one server only. When
all servers are busy serving requests, arriving requests have
to wait in a waiting line until a server becomes available.
This type of queuing system is known as a G/G/c system
in Kendall’s notation [1]. In this notation, the first letter
represents the type of distribution of the interarrival time
of requests, the second letter indicates the distribution for
the service time of requests, and c denotes the number of
servers. The letter G stands for a generic distribution, while
M (for Markovian or memoryless) stands for an exponential
distribution, and D for a deterministic distribution (i.e., a
constant value).
There is an extensive literature on the study of analytical
models of queueing systems in steady state, i.e., when the
average arrival rate of requests is smaller than the maximum
rate at which the system can perform work, i.e., the system
capacity (see e.g., [1]–[3]). The ratio between the average
arrival rate of requests and the system’s capacity is called
traffic intensity and is typically denoted by ρ in the queuing
literature. A queuing system is in steady-state when ρ< 1. For
some queuing systems (e.g., M/G/1, M/M/c) there are exact
steady state results while for others (e.g., G/G/1 and G/G/c)
there are approximations and/or bounds. Nevertheless, these
results apply only to systems in steady state.
However, most actual systems are subject to workload
surges (aka flash crowds), i.e, periods during which the arrival
rate exceeds the system’s capacity (see e.g., [4]–[9]). When
that happens, the queue length grows continuously and so does
the response time of requests. It turns out that the response
time continues to increase even after the surge is finished. In
other words, the response time does not return to its steady
state value as soon as the surge is over. As an illustration,
consider Fig. 1 that shows a rectangular-shaped workload
surge that lasts from t = 300 sec to t = 600 sec. The left
axis shows the response time R and the right axis shows the
average arrival rate. The traffic intensity shows a surge from
a value of 5 to 20 requests/sec and lasts for 5 minutes. The
response time curve (blue curve) shows the response time of
transactions that leave the system at a given time instant. As
we can see, even though the traffic intensity returned to its
steady-state value of 0.5 at time 600 sec, the response time
peak of 290 seconds was observed at time 880 sec and it only
returned to its pre-surge level at time 1,260 sec.
As illustrated above, workload surges generate very high
response times that can be orders of magnitude higher than
corresponding steady state values and can be very disruptive
to users and damaging to organizations that provide comput-
ing services. Fluid approximations to queuing theory have
been suggested as a way to analyze the transient behavior
of queues [10]. In that formulation, customers arrive as a
continuous fluid with a time-varying arrival rate. The equations
we derive here have a fluid approximation flavor but go beyond
what has been proposed previously.
Cloud providers, such as Infrastructure as a Service (IaaS),
allow for resources in the form of virtual machines to be
dynamically added or removed from the set of available
resources to cope with traffic intensity variability so as to help
ensure that response times stay within expected values. This
is called elasticity (see e.g., Amazon Elastic Compute Cloud,
EC2). Elasticity has been defined as the degree to which a
system is able to adapt to workload changes by provisioning
IEEE International Conference on Cloud and Autonomic Computing
978-1-5386-1939-1/17 $31.00 © 2017 IEEE
DOI 10.1109/ICCAC.2017.16
95
IEEE International Conference on Cloud and Autonomic Computing
978-1-5386-1939-1/17 $31.00 © 2017 IEEE
DOI 10.1109/ICCAC.2017.16
92