Dynamic Resource Provisioning for Data Streaming Applications in a Cloud
Environment
Smita Vijayakumar Qian Zhu Gagan Agrawal
Department of Computer Science and Engineering
The Ohio State University Columbus OH 43210
{vijayaks,zhuq,agrawal}@cse.ohio-state.edu
Abstract—The recent emergence of cloud computing is mak-
ing the vision of utility computing realizable, i.e., computing
resources and services from a cloud can be delivered, utilized,
and paid for in the same fashion as utilities like water or
electricity. Current cloud service providers have taken some
steps towards supporting the true pay-as-you-go or a utility-
like pricing model, and current research points towards more
fine-grained allocation and pricing of resources in the future.
In such environments, resource provisioning becomes a
challenging problem, since one needs to avoid both under-
provisioning (leading to application slowdown) and over-
provisioning (leading to unnecessary resource costs). In this
paper, we consider this problem in the context of streaming
applications. In these applications, since the data is generated
by external sources, the goal is to carefully allocate resources
so that the processing rate can match the rate of data arrival.
We have developed a solution that can handle unexpected data
rates, including the transient rates. We evaluate our approach
using two streaming applications in a virtualized environment.
I. I NTRODUCTION
Utility computing was a vision stated more than 40 years
back [18]. It refers to the desire that computing resources
and services be delivered, utilized, and paid for as utilities
such as water or electricity. The recent emergence of cloud
computing is making this vision realizable. Examples of
efforts in this area include Infrastructure as a Service (IaaS)
providers like Amazon EC2 [1] and Software as a Service
(SaaS) providers like Google AppEngine [3] and Microsoft
Azure [2]. In brief, cloud comprises the available computa-
tion and storage resources over the Internet. For many small
and new companies, this approach has the advantage of a
low or no initial cost, as compared to having to acquire and
manage hardware. In addition, a key advantage of this model
is the dynamic scalability of resources and services, with a
pay-as-you-go model, consistent with the vision of utility
computing.
The elasticity offered by the cloud computing model
avoids both under-provisioning and over- provisioning of
resources, which have been typical problems with a model
where a fixed set of resources were managed by a company
or a user. In other words, dynamic provisioning of computing
and storage resources is possible in cloud computing. With a
pay-as-you-go model, resource provisioning should be per-
formed carefully, to keep the resource budget to a minimum,
while meeting an application’s needs. Current cloud service
providers have taken some steps towards supporting the true
pay-as-you-go or a utility-like pricing model. For example,
in Amazon EC2, users pay on the basis of number of type
of instances they use, where an instance is characterized
(and priced) on basis of parameters like CPU family/cores,
memory, and disk capacity. The ongoing research in this area
is pointing towards the possibility of supporting more fine-
grained allocation and pricing of resources [10], [13]. Thus,
we can expect cloud environments where CPU allocation
in a virtual environment can be changed on-the-fly, with
associated change in price for every unit of time.
In such cloud environments, resource provisioning be-
comes a challenging problem. In this paper, we consider
this problem in the context of streaming applications. A
significant development over the last few years has been
the emergence of stream model of data (processing). In the
stream model of processing, data arrives continuously and
needs to be processed in real-time, i.e., the processing rate
must match the arrival rate. Two trends have contributed
to the emergence of this model for scientific applications.
First, scientific simulations and increasing numbers of high
precision data collection instruments (e.g. sensors attached
to satellites, medical imaging modalities, or environmental
sensors) are generating data continuously, and at a high rate.
The second is the rapid improvement in the technologies for
Wide Area Networking (WAN). As a result, often the data
can be transmitted faster than it can be stored or accessed
from disks within a cluster. In view of the growing popularity
of streaming model for data processing, a number of systems
have been developed in the last 6-7 years specifically target-
ing this class of applications [11], [7], [6], [20], [8], [22].
Resource provisioning for streaming applications involves
several unique challenges. Since the data is generated by
external sources, the goal is to carefully allocate resources
so as to match the rate of data arrival. A higher provisioning
of resources will unnecessarily increase the resource budget.
At the same time, with a lower provisioning of resources,
the processing rate will fail to match the data arrival rate,
and eventually cause buffer-overflow and loss of data. In our
approach, each stage of distributed streaming application run
on a server is monitored for its current load pattern. We want
to provision the resources to match the processing time with
2nd IEEE International Conference on Cloud Computing Technology and Science
978-0-7695-4302-4/10 $26.00 © 2010 IEEE
DOI 10.1109/CloudCom.2010.95
441