Performance of a High-Level Parallel Language on a High-Speed Network Henri Bal Raoul Bhoedjang Rutger Hofman Ceriel Jacobs Koen Langendoen Tim R¨ uhl Kees Verstoep Dept. of Mathematics and Computer Science Vrije Universiteit Amsterdam, The Netherlands Abstract Clusters of workstations are often claimed to be a good platform for parallel processing, especially if a fast network is used to interconnect the workstations. Indeed, high performance can be obtained for low-level message passing primitives on modern networks like ATM and Myrinet. Most application programmers, however, want to use higher- level communication primitives. Unfortunately, implementing such primitives efficiently on a modern network is a difficult task, because their software overhead is relatively much higher than on a traditional, slow network (such as Ethernet). In this paper we investigate the issues involved in implementing a high-level programming environment on a fast network. We have implemented a portable runtime system for an object-based language (Orca) on a collection of pro- cessors connected by a Myrinet network. Many performance optimizations were required in order to let application programmers benefit sufficiently from the faster network. In particular, we have optimized message handling, multi- casting, buffer management, fragmentation, marshalling, and various other issues. The paper analyzes the impact of these optimizations on the performance of the basic language primitives as well as parallel applications. Keywords: clusters, threads, communication protocols, multicast, Myrinet, Illinois Fast Messages. 1 Introduction Due to their wide availability, networks of workstations are an attractive platform for parallel processing. A major problem, however, is their high communication cost. Workstations are typically connected by a Local Area Network (LAN) such as Ethernet, which is orders of a magnitude slower than the interconnection networks used for modern multicomputers. Potentially, this problem can be solved by using a faster, more modern LAN, such as ATM, Fast Eth- ernet, Myrinet [4], or SCI [11]. Unfortunately, even with fast, Gigabit/sec networks, several important performance problems remain. First, some of the modern networks (and their software) are not designed to support parallel processing. They offer impressive bandwidths, but the communication latencies are often only marginally better than on traditional Ethernet. For parallel processing, latency is at least as important as bandwidth. Second, to make network-based parallel computing successful, it is essential that easy-to-use programming envi- ronments are developed. Giving the programmer the ability to unreliably send a 48-byte packet from one machine to This research is supported in part by a PIONIER grant from the Netherlands Organization for Scientific Research (N.W.O.). 1