Towards Automated Performance Prediction in Bulk-Synchronous Parallel Discrete-Event Simulation Mauricio Mar´ ın Escuela de Computaci´ on Universidad de Magallanes Casilla 113-D, Punta Arenas, Chile E-mail: mmarin@ona.fi.umag.cl Abstract This paper discusses the running time cost of performing discrete-event simulation on the bulk- synchronous parallel (BSP) model of computing [12, 16, 22]. The BSP model provides a general purpose frame- work for parallel computing which is independent of the architecture of the computer, and thereby it enables the development of portable software. In addition, the struc- ture of BSP computations allows the accurate determi- nation of the cost of parallel algorithms. We use this feature to devise a performance prediction methodology that enables the designer of parallel simulation models to predict in advance the systems which are amenable for efficient execution on a given BSP computer. The methodology is simple enough to be automated in paral- lel simulation languages. 1. Introduction In BSP, any parallel computer is seen as composed of a set of processor-local-memory components which communicate with each other through messages. The computation is organised as a sequence of supersteps. In each superstep, the processors may perform sequen- tial computations on local data and/or send messages to other processors. The messages are available for pro- cessing at their destinations by the next superstep, and each superstep is ended with the barrier synchronisation of the processors. The total running time cost of a BSP program is the cumulative sum of the costs of its super- steps, and the cost of each superstep is the sum of three quantities: , and , where (i) is the maximum of the computations performed by each processor, (ii) is the maximum number of words transmitted in mes- sages sent/received by each processor with each one- word-transmission costing units of running time, and (iii) is the cost of barrier synchronising the processors. The effect of the computer architecture is costed by the parameters and , which are increasing functions of . These values can be empirically determined [16]. Parallel discrete-event simulation (PDES) [3, 15], on the other hand, adopts the view of systems composed of a set of logical processes (LPs) that send messages to each other. The LPs encapsulate the state variables of the system and the messages contain events schedu- led to occur at specific points of the simulation time. The events may modify the LP state variables and may schedule the occurrence of new events in other LPs. Once the LPs are placed onto the processors, one is faced with the non-trivial problem of synchronising the parallel occurrence of events. A number of synchroni- sation protocols have been proposed to solve this prob- lem efficiently [3]. They can be classified into conserva- tive and optimistic protocols, within each of which asyn- chronous and synchronous modes of operation can be identified. In most cases, synchronous protocols operate in a bulk-synchronous fashion which is very similar to that promoted by the BSP model. In addition, the ef- ficient BSP implementation of asynchronous protocols has been investigated in [7]. It is widely accepted that the implementation of a parallel simulation is a quite involved and costly task. In contrast, it is much easier to quickly produce a proto- typic sequential simulator for the same system. It would be desirable then to predict well in advance the feasible speedups to be achieved with a parallel simulator before making the effort of implementing it. Moreover, the se- quential simulator could be used to approximately pre- dict the behaviour of the actual parallel simulator. Note