An Automated Approach to Create, Store, and
Analyze Large-scale Experimental Data in Clouds
Deepal Jayasinghe, Josh Kimball, Siddharth Choudhary, Tao Zhu, and Calton Pu.
Center for Experimental Research in Computer Systems, Georgia Institute of Technology
266 Ferst Drive, Atlanta, GA 30332-0765, USA.
{deepal, jmkimball, siddharthchoudhary, tao.zhu, calton}@cc.gatech.edu
Abstract—The flexibility and scalability of computing clouds
make them an attractive application migration target; yet, the
cloud remains a black-box for the most part. In particular,
their opacity impedes the efficient but necessary testing and
tuning prior to moving new applications into the cloud. A
natural and presumably unbiased approach to reveal the cloud’s
complexity is to collect significant performance data by conduct-
ing more experimental studies. However, conducting large-scale
system experiments is particularly challenging because of the
practical difficulties that arise during experimental deployment,
configuration, execution and data processing. In this paper we
address some of these challenges through Expertus – a flexible
automation framework we have developed to create, store and
analyze large-scale experimental measurement data. We create
performance data by automating the measurement processes
for large-scale experimentation, including: the application de-
ployment, configuration, workload execution and data collection
processes. We have automated the processing of heterogeneous
data as well as the storage of it in a data warehouse, which we
have specifically designed for housing measurement data. Finally,
we have developed a rich web portal to navigate, statistically
analyze and visualize the collected data. Expertus combines
template-driven code generation techniques with aspect-oriented
programming concepts to generate the necessary resources to
fully automate the experiment measurement process. In Expertus,
a researcher provides only the high-level description about the
experiment, and the framework does everything else. At the end,
the researcher can graphically navigate and process the data in
the web portal.
Keywords-Automation, Benchmarking, Cloud, Code Genera-
tion, Data Warehouse, ETL, Performance, Visualization.
I. I NTRODUCTION
As companies migrate their applications from traditional data
centers to private and public cloud infrastructures, they need
to ensure that their applications can move safely and smoothly
to the cloud. An application that performs one way in the data
center may not perform identically in computing clouds [19],
so companies need to consider their applications present and
future scalability and performance. Neglecting the possible
performance impacts due to cloud platform migration could
ultimately lead to lower user satisfaction, missed SLAs (Ser-
vice Level Agreement), and worse, lower operating income.
For instance, a study by Amazon reported that an extra delay
of just 100ms could result in roughly a 1% loss in sales [27].
Similarly, Google found that a 500ms delay in returning search
results could reduce revenues by up to 20% [23].
One of the most reliable approaches to better understand the
cloud is to collect more data through experimental studies. By
using the experimental measurement data, we can understand
what has happened, explain why it happened, and more impor-
tantly predict what will happen in the future. Yet, conducting
large-scale performance measurement studies introduce many
challenges due to the associated complexity of application
deployment, configuration, workload execution, monitoring,
data collection, data processing, and data storage and anal-
ysis. In addition, due to the nature of the experiments, each
experiment produces a huge amount of heterogeneous data,
exacerbating the already formidable data processing challenge.
Heterogeneity comes from the use of various benchmarking
applications, software packages, test platform (cloud), logging
strategies, monitoring tools and strategies. Moreover, large-
scale experiments often result in failures; hence, storing in-
complete or faulty data in the database is a waste of resources
and may impact the performance of data processing.
We address those challenges through Expertus — a flexible
automation framework we have developed to create, store and
analyze large-scale experimental measurement data. In Exper-
tus, we have created tools to fully automate the experiment
measurement process. Automation removes the error prone
and cumbersome involvement of human testers, reduces the
burden of configuring and testing distributed applications, and
accelerates the process of reliable applications testing. In our
approach, a user provides a description of the experiment (i.e.,
application, cloud, configurations, and workload) through a
domain specific language (or through the web portal), and
then Expertus generates all of the resources that are necessary
to automate the measurement process. Next, the experiment
driver uses the generated resources and deploys and configures
the application, executes workloads and monitors and collects
measurement data.
The main contribution of this paper is the tools and ap-
proaches we have developed to automate the data related
aspects. To address data processing and parsing challenges,
we have used ETL (extract, transform, and load) tools and
approaches [21], [22] to build a generic parser (Expertract) to
process the collected data. The proposed parser can process
more than 98% of the most commonly used file formats in
our experimental domain. To address the storage challenge,
we have designed a special data warehouse called Experstore
to store performance measurement data. Our data warehouse
is fully dynamic that is the tables are created and populated
357
IEEE IRI 2013, August 14-16, 2013, San Francisco, California, USA
978-1-4799-1050-2/13/$31.00 ©2013 IEEE