XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
Many-Constraint and Many-Objective Optimization
with Bias Index for Intercloud Multi-Workflow
Resource Provisioning
Courtney Powell
Information Initiative Center
Hokkaido University
Sapporo, Japan
ORCID: 0000-0003-2556-0920
Katsunori Miura
Department of Information and Management
Science
Otaru University of Commerce
Otaru, Japan
k-miura@res.otaru-uc.ac.jp
Masaharu Munetomo
Information Initiative Center
Hokkaido University
Sapporo, Japan
ORCID: 0000-0002-5750-9217
Abstract—Optimal deployment of big data applications
consisting of multiple components is difficult in geo-distributed
intercloud environments. This is because of the numerous
infrastructure components and options available and the variety
of constraints that must be satisfied, such as application, cloud
infrastructure, and data processing and privacy-related
constraints. The task becomes even more complicated when
multiple scientific workflows must be executed but the financial
budget for the acquisition of cloud resources is severely limited.
This paper proposes a many-objective constrained optimization
framework that solves these problems. The proposed framework
first conducts constraint satisfaction via equivalent
transformation, then many-objective optimization using
nondominated sorting, reference points, and elitism to provide a
unified approach in solving constrained many-objective
optimization multi-workflow resource provisioning problems in
geo-distributed intercloud environments. In the case of multiple
workflows, both optimization for each workflow and optimization
for the ensemble of workflows are considered. Furthermore, a
proposed bias index is presented that indicates on an objective-by-
objective basis the effect of the configuration generated for each
ensemble of workflows on the optimal configuration of each
constituent workflow. It also provides a means of ascertaining on
a granular level the relative fairness of each objective in each
composite resource configuration, and can be used as a tool for
prioritizing certain aspects of a workflow when deciding on the
optimal configuration to utilize. We demonstrate the efficacy of
the proposed framework through two genome analysis workflows,
for which site availability and resource reliability need to be
maximized, deployment cost and makespan need to be minimized,
and several constraints must be satisfied.
Keywords—big data applications deployment, constraint
satisfaction, equivalent transformation, geo-distributed cloud, multi-
objective optimization, multi-workflow optimization, NSGA,
predicate logic specification
I. INTRODUCTION
Big data applications can be deployed in an intercloud
environment consisting of numerous services provided by
various cloud service providers. However, numerous service
options have to be considered when deploying each component
of a target application. Such options include instance types
(Amazon EC2 has more than 100 instance types
(https://aws.amazon.com/ec2/instance-types/)), regions in
which to deploy each virtual machine (VM), and whether to
assign only one component to a VM or multiple components to
one VM. In addition, various constraints must be satisfied, such
as data location and their legal policies (general data protection
regulation (GDPR), etc.), especially when there is a need to
process privacy-sensitive data. Furthermore, it has been shown
that even from the same provider, the same instance type may
have different pricing schemes in different regions [1]. Thus, in
geo-distributed intercloud [1] environments, it is virtually
impossible for users to select manually optimal configurations
for cloud service options to deploy their applications.
The problem is compounded when a project has multiple
workflows to be executed in a limited time and with severely
constrained budget. Added to this may be the fact that some
workflows may be considered more important than others and
may therefore require a disproportionate amount of the budget,
some may require highly reliable resources, some may need to
be completed within a certain deadline, but all should be
executed considering a certain deadline and budget.
We have been developing a resource optimization engine to
deploy big data workflows by selecting cloud resources that
minimize the makespan and deployment cost of the target
workflow(s) and maximize availability and reliability, while
also considering various constraints such as maximum cost,
required performance, and data locations/policies. Our work is
part of the “Application-Centric Overlay Cloud Utilizing Inter-
Cloud” project [2] supported by JST CREST. The goal of the
project is to build an application-centric overlay cloud in an
intercloud environment that automatically deploys big data
workflows such as genome processing applications. The project
consists of four major subprojects with the following respective
objectives: (1) development of overlay intercloud middleware,
(2) building big data processing intercloud testbed including
supercomputers, (3) optimal resource selection in the intercloud,
and (4) big data, such as genomic data, applications processing
and co-simulation. For the overlay cloud providers, CREST
Aida group developed a Virtual Cloud Service System (VCSS)
that allows users to build and operate effective applications in
This work was supported by JST CREST Grant Number JPMJCR1501, Japan.