Performance Evaluation of Distributed Computing over Heterogeneous Networks Ouissem Ben Fredj and Éric Renault GET / INT — CNRS UMR 5157 SAMOVAR 9, rue Charles Fourier, 91011 Évry, France Tel.: +33 1 60 76 45 73, Fax: +33 1 60 76 47 80 {ouissem.benfredj,eric.renault}@int-edu.eu Abstract. RWAPI is a low-level communication interface designed for clusters of PCs. It has been developed to provide performance to higher applications on a wide variety of architectures. We implemented RWAPI on top of the modular software architecture called GRWA. RWAPI supports Ethernet, InfiniBand and Myrinet network interconnects. This paper introduces RWAPI and the design of its network component on top of both InfiniBand and Myrinet interconnects. We obtained a very low latency and high throughput compared to MPI results. 1 Introduction High-speed network interconnects that offer low latency and high bandwidth have been one of the main reasons attributed to the success of commodity cluster systems. Some of the leading high-speed networking interconnects include Gigabit-Ethernet, Infini- Band [1], Myrinet [2] and Quadrics [3]. Two common features shared by these inter- connects are User-level networking and Direct Memory Access (DMA). The best suited communication protocols that use efficiently these new features are one-sided proto- cols. It means that the completion of a send (resp. receive) operation does not require the intervention of the receiver (resp. sender) process to take a complementary action. RDMA should be used to copy data to (from) the remote user space directly. Suppose that the receiver process has allocated a buffer to hold incoming data and the sender has allocated a send buffer. Prior to the data transfer, the receiver must have sent its buffer address to the sender. Once the sender owns the destination address, it initiates a direct-deposit data sending. This task does not interfere with the receiver process. On the receiver side, it keeps on doing computation tasks, testing if new messages have arrived, or blocking until an incoming message event arises. At the network layer, many manufacturers have built RDMA features that ease the implementation of one-sided paradigms. For example, the HSL [4] network uses the PCI-Direct Deposit Component (PCI-DDC) [5] to offer a message-passing multipro- cessor architecture based on a one-sided protocol. InfiniBand [1] and Quadrics [3] pro- poses native one-sided communications. Myrinet [2,6] and QNIX [7] do not provide native one-sided communications. But these features may be added (as for example in GM [8] with Myrinet since Myrinet NICs are programmable). In the past, remote-write has been implemented in generic message-passing libraries like MPI-2 [9] or dedicated message-passing libraries like the PUT interface [10,11] R. Perrott et al. (Eds.): HPCC 2007, LNCS 4782, pp. 53–61, 2007. c Springer-Verlag Berlin Heidelberg 2007