XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE Many-Constraint and Many-Objective Optimization with Bias Index for Intercloud Multi-Workflow Resource Provisioning Courtney Powell Information Initiative Center Hokkaido University Sapporo, Japan ORCID: 0000-0003-2556-0920 Katsunori Miura Department of Information and Management Science Otaru University of Commerce Otaru, Japan k-miura@res.otaru-uc.ac.jp Masaharu Munetomo Information Initiative Center Hokkaido University Sapporo, Japan ORCID: 0000-0002-5750-9217 Abstract—Optimal deployment of big data applications consisting of multiple components is difficult in geo-distributed intercloud environments. This is because of the numerous infrastructure components and options available and the variety of constraints that must be satisfied, such as application, cloud infrastructure, and data processing and privacy-related constraints. The task becomes even more complicated when multiple scientific workflows must be executed but the financial budget for the acquisition of cloud resources is severely limited. This paper proposes a many-objective constrained optimization framework that solves these problems. The proposed framework first conducts constraint satisfaction via equivalent transformation, then many-objective optimization using nondominated sorting, reference points, and elitism to provide a unified approach in solving constrained many-objective optimization multi-workflow resource provisioning problems in geo-distributed intercloud environments. In the case of multiple workflows, both optimization for each workflow and optimization for the ensemble of workflows are considered. Furthermore, a proposed bias index is presented that indicates on an objective-by- objective basis the effect of the configuration generated for each ensemble of workflows on the optimal configuration of each constituent workflow. It also provides a means of ascertaining on a granular level the relative fairness of each objective in each composite resource configuration, and can be used as a tool for prioritizing certain aspects of a workflow when deciding on the optimal configuration to utilize. We demonstrate the efficacy of the proposed framework through two genome analysis workflows, for which site availability and resource reliability need to be maximized, deployment cost and makespan need to be minimized, and several constraints must be satisfied. Keywords—big data applications deployment, constraint satisfaction, equivalent transformation, geo-distributed cloud, multi- objective optimization, multi-workflow optimization, NSGA, predicate logic specification I. INTRODUCTION Big data applications can be deployed in an intercloud environment consisting of numerous services provided by various cloud service providers. However, numerous service options have to be considered when deploying each component of a target application. Such options include instance types (Amazon EC2 has more than 100 instance types (https://aws.amazon.com/ec2/instance-types/)), regions in which to deploy each virtual machine (VM), and whether to assign only one component to a VM or multiple components to one VM. In addition, various constraints must be satisfied, such as data location and their legal policies (general data protection regulation (GDPR), etc.), especially when there is a need to process privacy-sensitive data. Furthermore, it has been shown that even from the same provider, the same instance type may have different pricing schemes in different regions [1]. Thus, in geo-distributed intercloud [1] environments, it is virtually impossible for users to select manually optimal configurations for cloud service options to deploy their applications. The problem is compounded when a project has multiple workflows to be executed in a limited time and with severely constrained budget. Added to this may be the fact that some workflows may be considered more important than others and may therefore require a disproportionate amount of the budget, some may require highly reliable resources, some may need to be completed within a certain deadline, but all should be executed considering a certain deadline and budget. We have been developing a resource optimization engine to deploy big data workflows by selecting cloud resources that minimize the makespan and deployment cost of the target workflow(s) and maximize availability and reliability, while also considering various constraints such as maximum cost, required performance, and data locations/policies. Our work is part of the “Application-Centric Overlay Cloud Utilizing Inter- Cloud” project [2] supported by JST CREST. The goal of the project is to build an application-centric overlay cloud in an intercloud environment that automatically deploys big data workflows such as genome processing applications. The project consists of four major subprojects with the following respective objectives: (1) development of overlay intercloud middleware, (2) building big data processing intercloud testbed including supercomputers, (3) optimal resource selection in the intercloud, and (4) big data, such as genomic data, applications processing and co-simulation. For the overlay cloud providers, CREST Aida group developed a Virtual Cloud Service System (VCSS) that allows users to build and operate effective applications in This work was supported by JST CREST Grant Number JPMJCR1501, Japan.