A Parallel Data Storage Interface to GridFTP Alberto S´ anchez, Mar´ ıa S. P´ erez, Pierre Gueant, Jes´ us Montes, and Pilar Herrero Facultad de Inform´ atica Universidad Polit´ ecnica de Madrid Madrid, Spain Abstract. Most of the grid projects are characterized by accessing huge vol- umes of data. For supporting this feature, different data services have arisen in the “grid” world. One of the most successful initiatives in that field is GridFTP, a high-performance transfer protocol, based on FTP but optimized for wide area networks. Although GridFTP provides reasonably good performance, GridFTP servers keep constituting a bottleneck for data-intensive applications. One of the most important modules of a GridFTP server is the Data Storage Interface (DSI), which specifies how to read and write to the storage system, allowing the server to transform the data. With the aim of improving the perfor- mance of the GridFTP server, we have designed a new DSI, based on MAPFS, a parallel file system. This paper describes this new DSI and its evaluation, show- ing the advantages of dealing data through this optimized GridFTP server. Keywords: Data grid, GridFTP, Data Storage Interface (DSI), parallel file sys- tem, MAPFS. 1 Introduction In grid projects there is usually a need of transferring large files among different virtual organizations. This is specially significant in data-intensive applications, where access- ing and dealing with data is the most critical process. The most known protocol for transfer files in wide area networks is GridFTP [1], which is an extension of the popular FTP protocol to provide high-performance trans- ferences in a grid environment. Although there are different approaches for increasing the performance of the trans- ference between client and servers, (e.g., parallelism and striping), the access to an only server constitutes a bottleneck in the whole system, since the I/O bandwidth could be considerably lower than the network bandwidth. Nevertheless, the advantage of GridFTP is the possibility of modifying its DSI (Data Storage Interface) in order to transform the data retrieval process. Approaches from the parallel I/O field can be suc- cesfully applied to this scenario. This is the main motivation of our work. We have built a new DSI, named MAPFS-DSI, which making use of MAPFS [14], a parallel file system, can improve largely the performance of GridFTP. The rest of this paper is as follows. Section 2 describes the problems related to data management in grids. Section 3 shows our proposal, MAPFS-DSI, which enhances the file transference in grid environments. In Section 4, the evaluation of MAPFS-DSI is an- alyzed. Section 5 shows related work. Finally, Section 6 explains the main conclusions and outlines the ongoing and future work. R. Meersman, Z. Tari et al. (Eds.): OTM 2006, LNCS 4276, pp. 1203–1212, 2006. c Springer-Verlag Berlin Heidelberg 2006