OddCI: On-Demand Distributed Computing Infrastructure Rostand Costa 1,2 , Francisco Brasileiro 1 1 Federal University of Campina Grande Systems and Computing Department Distributed Systems Lab Av. Aprígio Veloso, 882 - Bloco CO – Bodocongó Campina Grande, Paraíba, Brazil +55 83 33101365 {rostand.costa, fubica}@lsd.ufcg.edu.br Guido Lemos Filho 2 , Dênio Mariz Sousa 2 2 Federal University of Paraíba Informatics Department Digital Video Applications Lab Campus I - Cidade Universitária João Pessoa, Paraíba, Brazil +55 83 32167093 {guido, denio}@lavid.ufpb.br ABSTRACT The availability of large quantities of processors is a crucial enabler of many-task computing. Voluntary computing systems have proven that it is possible to build computing platforms with millions of nodes to support the execution of embarrassingly parallel applications. These systems, however, lack the flexibility of more traditional grid infrastructures. On the other hand, flexible infrastructures currently available can gather only dozens of thousands nodes. We propose a novel architecture for generic Distributed Computing Infrastructures (DCI) that can be instantiated on demand to be, at the same time, flexible and highly- scalable. Bringing the scalability from voluntary computing, the flexibility from grid computing and the elasticity from cloud computing in a single arrangement, our proposal allows for fast setup, fast initialization and fast dismantle of customized DCI supported by both dedicated and shared underlying infrastructures. Our approach leverages broadcast communication as an efficient mechanism to enable aggregation of geographically distributed computing resources, including millions of non-traditional processing devices such as PDA, mobile phones and Digital TV receivers, using both opportunistic and non-opportunistic models. We show the feasibility of the proposed architecture by implementing it atop a digital television system. We also assess the performance of such system and show that it can be used to execute several classes of many-tasks computing applications with very high efficiency, substantially decreasing their response time. Categories and Subject Descriptors C.1.4 [Parallel Architectures]: Distributed architectures. General Terms Management, Performance. Keywords Distributed computing infrastructure; high-throughput computing; grid computing; cloud computing; many tasks computing; digital TV; broadcast; on-demand instantiation. 1. INTRODUCTION Parallel processing is a key technology to allow the timely processing of the ever increasing quantity of data that is currently being generated by sensors, scientific experiments, simulation models, and ultimately as an effect of the digitalization era that our society as a whole is experiencing. Some of the workloads that need to be processed are so large, that the only feasible way to handle them is to break the processing in a very large amount of loosely coupled sub-tasks and run them in parallel in as many processors as one possibly can. The term many-task computing (MTC), has recently been coined to refer to this kind of parallel processing [1]. The aggregated processing throughput achieved by scheduling as many sub-tasks as possible to run in parallel allows speeding up the execution of the application, substantially reducing its makespan 1 . In turn, large amount of parallelism can only be achieved if there is a relatively high level of independency among the sub-tasks that comprise the application and the scheduler has access to a huge number of processors. In this paper we are concerned with the latter issue, i.e. providing ways to assemble large pools of processors for the execution of MTC applications. Desktop grid computing has proved itself as a suitable environment for high-throughput computing. Condor [2] is arguably the most well known representative of the existent technology to enable high-throughput desktop grids. Other systems that followed Condor’s philosophy have also proven to be equally effective [3][4]. These generic infrastructures are, however, limited scale systems. Even if some sort of incentive mechanism is used [5], it is unlikely that a system comprising more than a few dozens of thousands of computers will ever be assembled. Indeed, the largest existing systems using these technologies feature less than a few thousands of computers [6]. Voluntary computing platforms [7][8], on the other hand, are able to assemble huge amounts of resources to process the extremely large workload of their typical applications. These powerful infrastructures are, however, less flexible in the types of applications that they support. Firstly, setting up a voluntary computing infrastructure has a cost that is significantly higher than that associated to the assembling of desktop grids; this mainly due 1 The application’s makespan is a key metric for measuring the efficiency of the execution of an MTC application; it is given by the difference between the latest completion time among all sub- tasks of the application and the submission time of the application. © 2009 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the Brazilian Government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. MTAGS '09 November 16th, 2009, Portland, Oregon, USA Copyright © 2009 ACM 978-1-60558-714-1/09/11... $10.00