Investigations on InfiniBand: Efficient Network Buffer Utilization at Scale Galen M. Shipman 1 , Ron Brightwell 2 , Brian Barrett 1 , Jeffrey M. Squyres 3 , and Gil Bloch 4 1 Los Alamos National Laboratory , Los Alamos, NM USA, LA-UR-07-3198 {gshipman,bbarrett}@lanl.gov 2 Sandia National Laboratories  , Albuquerque, NM USA rbbrigh@sandia.gov 3 Cisco, Inc., San Jose, CA USA jsquyres@cisco.com 4 Mellanox Technologies, Santa Clara, CA USA gil@mellanox.com Abstract. The default messaging model for the OpenFabrics “Verbs” API is to consume receive buffers in order—regardless of the actual in- coming message size—leading to inefficient registered memory usage. For example, many small messages can consume large amounts of registered memory. This paper introduces a new transport protocol in Open MPI implemented using the existing OpenFabrics Verbs API that exhibits effi- cient registered memory utilization. Several real-world applications were run at scale with the new protocol; results show that global network re- source utilization efficiency increases, allowing increased scalability—and larger problem sizes—on clusters which can increase application perfor- mance in some cases. 1 Introduction The recent emergence of near-commodity clusters with thousands of nodes con- nected with InfiniBand (IB) has increased the need for examining scalability issues with MPI implementations for IB. Several of these issues were originally discussed in detail for the predecessor to IB [1], and several possible approaches to overcoming some of the more obvious scalability limitations were proposed. This study examines the scalability, performance, and complexity issues of the message buffering for implementations of MPI over IB. The semantics of IB Verbs place a number of constraints on receive buffers. Receive buffers are consumed in FIFO order, and the buffer at the head of Los Alamos National Laboratory is operated by Los Alamos National Security, LLC, for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396.  Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. F. Cappello et al. (Eds.): EuroPVM/MPI 2007, LNCS 4757, pp. 178–186, 2007. c Springer-Verlag Berlin Heidelberg 2007