Proposal for an Optimal Job Allocation Method for Data-intensive Applications based on Multiple Costs Balancing in a Hybrid Cloud Environment Yumiko Kasae Ochanomizu University 2-1-1, Otsuka, Bunkyo-ku Tokyo 112-8610, JAPAN 81-3-5978-5393 yumiko@ogl.is.ocha.ac.jp Masato Oguchi Ochanomizu University 2-1-1, Otsuka, Bunkyo-ku Tokyo 112-8610, JAPAN 81-3-5978-5393 oguchi@computer.org ABSTRACT Due to the explosive increase in the amount of information in computer systems, we need a system that can process large amounts of data efficiently. Cloud computing system is an effective means to achieve this capacity and has spread throughout the world. In our research, we focus on hybrid cloud environments, and we propose a method for efficiently processing large amounts of data while responding flexibly to needs related to performance and costs. We have developed this method as middleware. For data-intensive jobs using this system, we have cre- ated a benchmark that can determine the saturation of the system resources deterministically. Using this benchmark, we can determine the parameters in this middleware. This middleware can provide Pareto optimal cost load balancing based on the needs of the user. The results of the evaluation indicate the success of the system. We then compare the processing time when these jobs are processed sequentially and the processing time using this measurement. Categories and Subject Descriptors H.3.4 [INFORMATION STORAGE AND RETRIEVAL]: Sys- tems and Software—Distributed systems; D.1.3 [Software]: PROGRAMMING TECHNIQUES—Concurrent Program- ming, Distributed programming General Terms Theory Keywords Hybrid cloud, Load balancing, Data processing, Performance, Cost Balance Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICUIMC(IMCOM) ’13, January 17-19, 2013, Kota Kinabalu, Malaysia. Copyright 2013 ACM 978-1-4503-1958-4 ...$15.00. 1. INTRODUCTION In recent years, large amounts of data, referred to as big data, have become more common with the development of information and communications, creating the need for ef- ficient data processing. As a platform for processing these data, hybrid cloud environments have become a focus of attention. In hybrid cloud environments, users can access public clouds and private clouds; private clouds are secure clouds built using the secure resources of the user company, and public clouds can provide scalable resources if the user pays metered rates. Combining these clouds can address shortcomings related to safety and scalability. For data- intensive jobs, hybrid clouds are appropriate. For increas- ing amounts of data, hybrid clouds can provide secure and scalable processing. However, performance and costs must be balanced. When we want to process large amounts of data more rapidly, us- ing many resources that are provided by public clouds, in addition to those provided by private clouds, will increase speed, but the metered cost will also be greater. In contrast, if these jobs are processed using private cloud resources al- most exclusively, users will not have to pay metered rates, but the job execution time will be longer. Thus, we need a system that can determine optimal job placement based on cost limitations and necessary performance to ensure ef- ficient processing in hybrid cloud environments. Therefore, in this research, we proposed a method for pro- viding optimal job placement in hybrid cloud environments in terms of monetary costs and performance. We have de- veloped this system as middleware. In addition, the middle- ware provides optimal job placement for both CPU-intensive applications and data-intensive applications. In general, un- like in CPU-intensive applications, which can accurately de- termine the load using the CPU usage, efficient resource use in data-intensive applications is difficult to determine. In the proposed method, we created a benchmark that can be used to change the extent of the load of CPU processing and I/O processing, and we measure the performance of hybrid clouds as an execution environment using this benchmark. Based on the results obtained using this benchmark, we pro- pose a method of determining job execution status based on the status of the I/O resources. In this paper, we will describe the details of the middle- ware that can be used to implement the method proposed in this study. We have evaluated the balance of performance