Distrib. Syst. Engng 3 (1996) 9–19. Printed in the UK Parallel application performance in a shared resource environment Gregory D Peterson† and Roger D Chamberlain‡ Computer and Communications Research Center, Washington University, Campus Box 1115, One Brookings Drive, St. Louis, MO, 63130-4899, USA Abstract. The utilization of networked, shared, heterogeneous workstations as an inexpensive parallel computational platform is an appealing idea. However, most performance models for parallel computation are oriented towards the use of tightly-coupled, dedicated, homogeneous processors. We develop and validate an analytic performance modelling methodology for synchronous iterative algorithms executing on networked workstations. The model includes the effects of application load, background load, and processor heterogeneity. We use two applications, nonlinear optimization and discrete-event simulation, to validate the model. Various policies for the use of the workstations are considered and the optimal (or near-optimal) scheduling found. The performance modelling methodology provides signiﬁcant help in addressing scheduling and similar issues in a shared resource environment. 1. Introduction To provide cost-effective computing resources for compu- tationally intensive applications, parallel processing tech- niques are increasingly being applied to networks of exist- ing workstations. The majority of the time, these work- stations are idle and therefore under-utilized. The use of networked workstations as a parallel computing platform raises a number of interesting issues, especially when the primary use that motivated a workstation’s purchase was the day-to-day computing needs of the workstation’s owner. The processing environment assumed here is a network of workstations that are connected via a local area network. The workstations are not dedicated resources; several users may be utilizing them while the computation of interest is executing. In addition, the power (i.e., computational speed) of the individual workstations may vary, although we will assume their basic architecture is the same (single CPU, signiﬁcant local memory, possibly local disk). In order to facilitate cooperative work across the workstations, there are a number of systems available that provide message passing and process control primitives [26]. Our experimental results use the Parallel Virtual Machine (PVM) system [24]. The use of networked workstations as a distributed computing platform has many similarities with massively parallel processing (MPP) systems (e.g., parallel algorithm development, workload partitioning, communications scheduling, etc). However, there are challenges that are speciﬁc to the distributed computing environment. This paper explores two of these challenges: the performance implications and policy issues associated with executing parallel programs on shared resources. We present † E-mail: gdp@el.wpafb.af.mil ‡ E-mail: roger@ccrc.wustl.edu accurate performance models that take into account both the mapping of application tasks to processors and the background load resulting from other users of the processors (e.g., the workstation’s owner). We focus on the important class of synchronous iterative (or multiphase) algorithms. This is a large class of algorithms, including optimization, discrete- event simulation, solution to sets of partial differential equations, Gaussian elimination, and many others. Several authors have derived performance models (and can ﬁnd completion times) for synchronous iterative algorithms running on dedicated, homogeneous resources [4, 9, 13]. In contrast, analytic performance results for the use of shared, heterogeneous resources have been sparse. In this paper, we develop a performance modelling methodology for synchronous iterative algorithms executing on shared, heterogeneous resources. The performance model focuses on computational requirements; we assume a compute- intensive parallel application in which the computational requirements dominate. Hence, the modelling methodology is most accurate for relatively small processor populations (typically with less than 100 machines). Once we have an accurate performance model, it can be used to help address a number of interesting policy questions that relate speciﬁcally to the use of shared computing resources. Essentially, we need to balance the competing requirements of the various users of the workstations. For an individual workstation’s owner, the primary purpose of the machine is to service his or her day-to-day computing requirements (e.g., word processing, e-mail, etc). For these types of computing needs, minimum response time is typically of primary importance. For the initiator of a large parallel application, minimum completion time is also an important goal. However, for these two types of users to effectively share a common 0967-1846/96/010009+11$19.50 c  1996 The British Computer Society, The Institution of Electrical Engineers and IOP Publishing Ltd 9