A Comparative Study on Low-level APIs for Myrinet and SCI-based Clusters Fábio Abreu Dias de Oliveira; Marcos Ennes Barreto; Rafael Bohrer Ávila and Philippe Olivier Alexandre Navaux Institute of Informatics - Federal University of Rio Grande do Sul Av. Bento Gonçalves, 9500 — Bloco IV PO BOX 15064 90501-910 Porto Alegre - RS — Brazil E-mail: {fabreu,barreto,avila,navaux}@inf.ufrgs.br ABSTRACT This paper presents a survey concerning some of the most important APIs (Application Programming Interfaces) for Myrinet and SCI networks, which are widely used in the context of cluster computing. The related APIs are comparatively analyzed. This text is not supposed to be a programming manual; the analysis takes into account the APIs’ functionality and programming facilities. Keywords: APIs, Cluster Computing, Message Passing Programming, Shared Memory Programming, Myrinet, SCI. 1. INTRODUCTION Clusters of PCs with high-speed networks are emerging as an economical alternative to dedicated parallel computers and, accordingly, questions regarding their programmability have become increasingly important to resolve. Recent advances in the interconnection technology have provided clusters with a very attractive cost-performance ratio. As a natural consequence of the mentioned situation, a question arises: which is the most suitable paradigm for programming clusters of PCs or workstations? Traditionally, message passing have been the chosen model for programming loosely coupled systems like clusters, because this is usually the way to get best performances in such systems. The drawback of this approach, however, is that the programmer needs to explicitly define the data and work distribution across the different nodes of the cluster and, consequently, this paradigm is quite different as compared to the traditional sequential model. On the other hand, the shared memory programming model is more intuitive, but the lack of good performances has lead the clusters programmers to avoid it, despite its inherent easier programmability. The shared memory paradigm is that used in SMP — Symmetric MultiProcessor — architectures. Nowadays, the Myrinet high-performance network [2] is widely used to interconnect clusters of PCs or workstations and, accordingly, the message passing paradigm has been adopted by programmers. There is a number of low-level message passing APIs especially designed to efficiently use the Myrinet hardware capabilities, namely: GM [7], BIP [9], FM [8], among others. In contrast to Myrinet message passing network, the SCI standard [6] — Scalable Coherent Interface — opens the possibility to exploit good performances while programming through the shared memory paradigm. This emerging interconnection technology eliminates the performance gap between the message passing and shared memory paradigms. SCI provides a DSM — Distributed Shared Memory — hardware support, allowing the definition of shared memory segments across the nodes of the cluster. The SCI network achievable performance is not only comparable but even better than the Myrinet one. In spite of the hardware DSM support provided by SCI, some research groups still adheres to the message passing paradigm, even in SCI-based clusters. In order to change this situation, some groups have made efforts to implement APIs for SCI shared memory programming. YASMIN [10], SMI [3], and SISCI API [5] resulted from these efforts. This paper is organized as follows: in section 2 we present the Myrinet network characteristics; section 3 points out the main issues concerning the APIs for Myrinet, whereas section 4 explicitly compares the APIs presented in the previous section; in section 5, the SCI network philosophy is presented; sections 6 and 7 are concerned about APIs for SCI and, finally, section 8 presents our conclusions. 2. THE MYRINET NETWORK Two research projects inspired the underlying technology of the Myrinet network: an experimental fine grain multicomputer and ATOMIC LAN, respectively by USC and Caltech research groups. The Myrinet is the main product of the Myricom company, which was founded by members of both groups. Hundreds of installations over the world have adopted the Myrinet technology as the interconnection network of clusters. The fact that the software and hardware specifications are open perhaps is the foremost reason for this popularity, mainly in the academic community, since it encourages developments in advanced protocols that could fit particular needs in terms of performance. Myrinet is a switched network based on cut-through (wormhole) routing. The network interfaces can perform dynamic mapping of the network topology, which is arbitrary, in contrast to the regular topologies typically found in massively parallel computers, such as mesh or hipercube. The full-duplex links support a bandwidth of 1.28 Gbits/s in one direction, provide reverse flow control, error control and are monitored by the switch. Each link connects a switch with another one or a network interface with a switch, resulting in an arbitrary topology (see figure 1).