The Cluster File System: Integration of High Performance Communication and I/O in Clusters Rosario Cristaldi and Giulio Iannello Dipartimento di Informatica e Sistemistica Universit` a di Napoli Federico II rosario.cristaldi@unina.it, iannello@unina.it Francesco Delfino Laboratorio Nazionale CINI per l’Informatica e la Telematica Multimediali francesco.delfino@napoli.consorzio-cini.it Abstract In this paper, we report on the experiences in design- ing a portable parallel file system for clusters. The file sys- tem offers to the applications an interface compliant with MPI-IO, the I/O interface of the MPI-2 standard. The file system implementation relies upon MPI for internal coor- dination and communication. This guarantees high per- formance and portability over a wide range of hardware and software cluster platforms. The internal architecture of the file system has been designed to allow rapid prototyping and experimentation of novel strategies for managing par- allel I/O in a cluster environment. The discussion of the file system design and early implementation is completed with basic performance measures confirming the potential of the approach. 1 Introduction Cluster computing has recently emerged as an effective approach to build powerful and scalable platforms, capa- ble to meet the performance requirements of the most chal- lenging applications. In this respect, clusters represent an interesting alternative to the great variety of parallel ma- chines available since many years [21]. The most interest- ing feature of clusters is perhaps the commodity off-the- shelf (COTS) nature of their basic components, which make them very competitive with parallel machines in terms of cost/performance ratio. The COTS nature of clusters, how- ever, makes it challenging to achieve a single system image of the system. This motivated an intense research activity in the area of system software and architectural support to improve the integration between the basic components of clusters. An important result of these efforts has been the development of low-level communication libraries capable of delivering the performance of modern Gigabit LANs to the applications [4, 5, 8]. More recently, the attention of researchers moved also to other areas. An important issue that is receiving an increas- ing attention is the development of architectural support for high performance I/O. Research in this area stems from the more general activity on high performance I/O, and most results are not specific for cluster architectures, though they could effectively contribute to improve available support for high performance and scalable I/O in this area [3, 9, 10]. A project specifically addressed to cluster computing is the de- velopment of the Parallel Virtual File System (PVFS) [17], which relies on TCP/IP as a communication support. Pre- liminary experiments on different platforms confirm the ef- fectiveness of the approach [3, 23]. However, the performance levels of communication sub- systems and the technological advances in commodity off- the-shelf I/O devices require further work to fully explore the potential of these components. In particular, only data about homogeneous configurations and fairly simple access patterns have been reported in the literature. Also the inter- nal architecture of nodes devoted to I/O functions deserve more attention. To our knowledge only architectures rely- ing on TCP/IP for communications internal to the file sys- tems have been developed. While TCP/IP guarantees ro- bustness and portability, it limits the study about the inter- action between multiple I/O and communication activities to this protocol. In this paper, we report on the experiences in design- ing a portable parallel file system for clusters especially de- signed to address these research issues. The file system, named CLUFS (CLUster File System), offers to the appli- cations an interface compliant with MPI-IO, the I/O inter- face of the MPI-2 standard [19]. The internal architecture of the file system has been designed to allow rapid proto- typing and experimentation of novel strategies for manag- ing parallel I/O in a cluster environment. It relies upon the MPI standard [18] for internal coordination and communi- cation. This guarantees high performance and portability over a wide range of hardware and software cluster plat- forms. The discussion of the file system design and early Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID02) 0-7695-1582-7/02 $17.00 ' 2002 IEEE