J Grid Computing (2012) 10:521–552
DOI 10.1007/s10723-012-9227-2
A Provenance-based Adaptive Scheduling Heuristic
for Parallel Scientific Workflows in Clouds
Daniel de Oliveira · Kary A. C. S. Ocaña ·
Fernanda Baião · Marta Mattoso
Received: 30 September 2011 / Accepted: 9 August 2012 / Published online: 25 August 2012
© Springer Science+Business Media B.V. 2012
Abstract In the last years, scientific workflows
have emerged as a fundamental abstraction
for structuring and executing scientific experi-
ments in computational environments. Scientific
workflows are becoming increasingly complex and
more demanding in terms of computational re-
sources, thus requiring the usage of parallel tech-
niques and high performance computing (HPC)
environments. Meanwhile, clouds have emerged
as a new paradigm where resources are virtual-
ized and provided on demand. By using clouds,
scientists have expanded beyond single parallel
D. de Oliveira (B ) · K. A. C. S. Ocaña · M. Mattoso
Federal University of Rio de Janeiro - COPPE/UFRJ,
P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil
e-mail: danielc@cos.ufrj.br
K. A. C. S. Ocaña
e-mail: kary@cos.ufrj.br
M. Mattoso
e-mail: marta@cos.ufrj.br
F. Baião
Federal University of the State
of Rio de Janeiro – UNIRIO,
Rio de Janeiro, RJ, Brazil
F. Baião
e-mail: fernanda.baiao@uniriotec.br
computers to hundreds or even thousands of vir-
tual machines. Although the initial focus of clouds
was to provide high throughput computing, clouds
are already being used to provide an HPC envi-
ronment where elastic resources can be instanti-
ated on demand during the course of a scientific
workflow. However, this model also raises many
open, yet important, challenges such as scheduling
workflow activities. Scheduling parallel scientific
workflows in the cloud is a very complex task since
we have to take into account many different crite-
ria and to explore the elasticity characteristic for
optimizing workflow execution. In this paper, we
introduce an adaptive scheduling heuristic for par-
allel execution of scientific workflows in the cloud
that is based on three criteria: total execution
time (makespan), reliability and financial cost.
Besides scheduling workflow activities based on a
3-objective cost model, this approach also scales
resources up and down according to the restric-
tions imposed by scientists before workflow exe-
cution. This tuning is based on provenance data
captured and queried at runtime. We conducted a
thorough validation of our approach using a real
bioinformatics workflow. The experiments were
performed in SciCumulus, a cloud workflow en-
gine for managing scientific workflow execution.
Keywords Cloud computing · Scientific
workflow · Scientific experiment · Provenance