1 Campus-Wide Computing: Early Results Using Legion at the University of Virginia 1 Andrew S. Grimshaw Anh Nguyen-Tuong William A. Wulf Abstract The Legion project at the University of Virginia is an attempt to provide system services that provide the illusion of a single virtual machine to users, a virtual machine that provides both improved response time via parallel execution and greater throughput. Legion is targeted towards both workstation clusters and towards larger, wide-area, assemblies of workstations, supercomputers, and parallel supercomputers. Rather than construct Legion from scratch we are extending an existing object-oriented parallel processing system by aggressively incorporating lessons learned over twenty years by the heterogeneous distributed systems community. The campus-wide virtual computer is an early Legion prototype. In this paper we present challenges that had to be overcome to realize a working CWVC, as well as performance on a production biochemistry application. 1. Introduction Providing resources to computationally demanding applications at the lowest cost is a challenge facing many organizations. The traditional solution to providing the necessary cycles has been to use a supercomputer. An alternative, less costly, solution that has emerged recently is to use networks of existing high-performance workstations instead, managing the collection of resources as a single entity. These systems are called variously “workstation farms” or “workstation clusters”. The advantage of the cluster approach is that the resources are often already in place, and under-utilized. A second advantage is that the cost per MIP/FLOP is much less. A key problem that must be addressed in cluster computing is management. The collection of workstations is just that, a collection. Without system software to tie the machines together it is not easy for a user to exploit cycles on many different workstations. There are two broad categories of solutions to the problem of managing the workstation resources, throughput oriented systems, and response-time oriented systems. Throughput oriented systems are interested in exploiting available resources in order to service the largest number of jobs, where a job is a single program that does not communicate with other jobs. There are several 1. This work is partially funded by NSF grants ASC-9201822 and CDA-8922545-01, National Laboratory of Medicine grant (LM04969), NRaD contract N00014-94-1-0882, and ARPA grant J-FBI-93-116.