Celebrating Diversity in Volunteer Computing David P. Anderson University of California, Berkeley davea@ssl.berkeley.edu Kevin Reed IBM knreed@us.ibm.com Abstract The computing resources in a volunteer computing system are highly diverse in terms of software and hardware type, speed, availability, reliability, network connectivity, and other properties. Similarly, the jobs to be performed may vary widely in terms of their hardware and completion time requirements. To maximize system performance, the system’s job selection policy must accommodate both types of diversity. In this paper we discuss diversity in the context of World Community Grid (a large volunteer computing project sponsored by IBM) and BOINC, the middleware system on which it is based. We then discuss the techniques used in the BOINC scheduler to efficiently match diverse jobs to diverse hosts. 1. Introduction Volunteer computing is a form of distributed computing in which the general public volunteers processing and storage resources to computing projects. BOINC is a software platform for volunteer computing [2]. BOINC is being used by projects in physics, molecular biology, medicine, chemistry, astronomy, climate dynamics, mathematics, and the study of games. There are currently 50 projects and 580,000 volunteer computers supplying an average of 1.2 PetaFLOPS. Compared to other types of high-performance computing, volunteer computing has a high degree of diversity. The volunteered computers vary widely in terms of software and hardware type, speed, availability, reliability, and network connectivity. Similarly, the applications and jobs vary widely in terms of their resource requirements and completion time constraints. These sources of diversity place many demands on BOINC. Foremost among these is the job selection problem: when a client contacts a BOINC scheduling server, the server must choose, from a database of perhaps a million jobs, those which are “best” for that client according to a complex set of criteria. Furthermore, the server must handle hundreds of such requests per second. In this paper we discuss this problem in the context of the IBM-sponsored World Community Grid, a large BOINC-based volunteer computing project. Section 2 describes the BOINC architecture. Section 3 summarizes the population of computers participating in World Community Grid, and the applications it supports. In Section 4 we discuss the techniques used in the BOINC scheduling server to efficiently match diverse jobs to diverse hosts. This work was supported by National Science Foundation award OCI-0721124. 2. The BOINC model and architecture The BOINC model involves projects and volunteers. Projects are organizations (typically academic research groups) that need computing power. Projects are independent; each operates its own BOINC server. Volunteers participate by running BOINC client software on their computers (hosts). Volunteers can attach each host to any set of projects, and can specify the quota of bottleneck resources allocated to each project. A BOINC server is centered around a relational database, whose tables correspond to the abstractions of BOINC’s computing model: z Platform: an execution environment, typically the combination of an operating system and processor type (Windows/x86), or a virtual environment (VMWare/x86 or Java). z Application: the abstraction of a program, independent of platforms or versions. z Application version: an executable program. Each is associated with an application, a platform, a version number, and one or more files (main program, libraries, and data). z Job: a computation to be done. Each is associated with an application (not an application version or platform) and a set of input files. z Job instance: the execution of a job on a particular host. Each job instance is associated with an application version (chosen when the job is issued) and a set of output files. Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009 1 978-0-7695-3450-3/09 $25.00 © 2009 IEEE