Cluster Computing 7, 113–122, 2004 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Seamless Access to Decentralized Storage Services in Computational Grids via a Virtual File System RENATO J. FIGUEIREDO * Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA NIRAV KAPADIA Capital One Services, Inc., Glen Allen, VA 23060, USA JOSÉ A.B. FORTES Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA Abstract. This paper describes a novel technique for establishing a virtual file system that allows data to be transferred user-transparently and on-demand across computing and storage servers of a computational grid. Its implementation is based on extensions to the Network File System (NFS) that are encapsulated in software proxies. A key differentiator between this approach and previous work is the way in which file servers are partitioned: while conventional file systems share a single (logical) server across multiple users, the virtual file system employs multiple proxy servers that are created, customized and terminated dynamically, for the duration of a computing session, on a per- user basis. Furthermore, the solution does not require modifications to standard NFS clients and servers. The described approach has been deployed in the context of the PUNCH network-computing infrastructure, and is unique in its ability to integrate unmodified, interactive applications (even commercial ones) and existing computing infrastructure into a network computing environment. Experimental results show that: (1) the virtual file system performs well in comparison to native NFS in a local-area setup, with mean overheads of 1 and 18%, for the single-client execution of the Andrew benchmark in two representative computing environments, (2) the average overhead for eight clients can be reduced to within 1% of native NFS with the use of concurrent proxies, (3) the wide-area performance is within 1% of the local-area performance for a typical compute-intensive PUNCH application (SimpleScalar), while for the I/O-intensive application Andrew the wide-area performance is 5.5 times worse than the local-area performance. Keywords: file system, computational grid, network-computing, logical account, proxy 1. Introduction Network-centric computing promises to revolutionize the way in which computing services are delivered to the end-user. Analogous to the power grids that distribute electricity today, computational grids will distribute and deliver computing ser- vices to users anytime, anywhere. Corporations and univer- sities will be able to out-source their computing needs, and individual users will be able to access and use software via Web-based computing portals. A computational grid brings together computing nodes, applications, and data distributed across the network to de- liver a network-computing session to an end-user. This paper elaborates on mechanisms by which users, data, and applica- tions can be decoupled from individual computers and admin- istrative domains. The mechanisms, which consist of logical user accounts and a virtual file system, introduce a layer of ab- straction between the physical computing infrastructure and the virtual computational grid perceived by users. This ab- straction converts compute servers into interchangeable parts, allowing a computational grid to assemble computing sys- tems at run time without being limited by the traditional con- * Corresponding author. E-mail: renato@acis.ufl.edu straints associated with user accounts, file systems, and ad- ministrative domains. Specifically, this paper describes the structure of logical user accounts, and presents a novel implementation of a vir- tual file system that operates with such logical accounts. The virtual file system described in this paper allows data to be transferred on-demand between storage and compute servers for the duration of a computing session, while preserving a logical user account abstraction. It builds on an existing, de- facto standard available for heterogeneous platforms – the Network File System, NFS. The virtual file system is real- ized via extensions to existing NFS implementations that al- low reuse of unmodified clients and servers of conventional operating systems: the proposed modifications are encapsu- lated in software proxies that are configured and controlled by the computational grid middleware. The described approach is unique in its ability to integrate unmodified applications (even commercial ones) and exist- ing computing infrastructure into a heterogeneous, wide-area network computing environment. This work was conducted in the context of PUNCH [8,10], a platform for Internet computing that turns the World Wide Web into a distributed computing portal. It is designed to operate in a distributed, limited-trust environment that spans multiple administrative