A User-level Secure Grid File System Ming Zhao Renato J. Figueiredo Advanced Computing and Information Systems Laboratory (ACIS) Electrical and Computer Engineering, University of Florida {ming, renato}@acis.ufl.edu ABSTRACT A grid-wide distributed file system provides convenient data access interfaces that facilitate fine-grained cross-domain data sharing and collaboration. However, existing widely-adopted distributed file systems do not meet the security requirements for grid systems. This paper presents a Secure Grid File System (SGFS) which supports GSI-based authentication and access control, end-to-end message privacy, and integrity. It employs user-level virtualization of NFS to provide transparent grid data access leveraging existing, unmodified clients and servers. It supports user and application-tailored security customization per SGFS session, and leverages secure management services to control and configure the sessions. The system conforms to the GSI grid security infrastructure and allows for seamless integration with other grid middleware. A SGFS prototype is evaluated with both file system benchmarks and typical applications, which demonstrates that it can achieve strong security with an acceptable overhead, and substantially outperform native NFS in wide-area environments by using disk caching. 1. INTRODUCTION Distributed “Grid” computing systems have been successfully applied in several domains of science, providing for sharing of resources and data across administrative boundaries. A key challenge arising in such systems is data management - how to seamlessly provide data to applications and users in wide-area environments. In the absence of widely deployed grid-wide distributed file systems (DFSs), existing solutions are often based on explicit file transfer (“staging”), or require users to program applications with specific grid-enabling APIs. Nonetheless, a grid-wide file system can facilitate data access and sharing by exposing familiar interfaces of local area DFSs (such as NFS [41][7][40]) to users. It is also desirable for applications that cannot be modified, require implicit data access, have complex access patterns, operate on large and sparse data sets, or require fine-grained data sharing, because data transfers can be performed on-demand, on a per-block basis. Security is one of the most important concerns for data management in grid environments, where data are shared across organizations with limited mutual-trust, and stored and transferred on resources with limited security. Providing secure grid-wide data access is a challenging task with existing DFSs. In a grid system, virtual organizations are dynamically established, applications and services are dynamically initiated, and entities and trust are dynamically created. Conventional DFSs are not capable to meet this challenge, because they are designed for general file system usage (typically for LANs), and favor static, homogeneous configurations – rather than the dynamic environments encountered in grid deployments. Nonetheless, recent work has shown the feasibility of applying user-level techniques to build wide-area file systems on top of existing kernel implementations [34][16]. Examples of systems that use NFS to mount grid data are found in the middleware of Legion [43], PUNCH [26], and In-VIGO [1]. This paper proposes such a user-level solution that addresses the aforementioned challenges with a Secure Grid File System (SGFS). It enables secure network communications based on mature technologies (SSL/TLS [17][12]), and employs widely-accepted security tokens (X.509/GSI certificates [42]) to provide compatible grid authentication and flexible access control. SGFS allows data sessions to be created on a per-user or per-application basis, and such sessions can be customized with respect to the security policies and mechanisms. Furthermore, it leverages service-based middleware with standards-conforming security (WS-Security [54]) to manage and configure the sessions. Overall, the proposed approach makes the following contributions: 1) it achieves strong security for grid-wide file systems; 2) it leverages user-level techniques that support unmodified applications and operating systems; 3) it supports flexible selection of security configurations for file systems based on user and application needs; 4) it conforms to the grid security infrastructure (GSI) and therefore can be easily integrated with other grid middleware and systems. The paper evaluates an implementation of SGFS with file system benchmarks (IOzone and PostMark), and applications capturing the behavior of both interactive access to data in a development environment (MAB) and scientific computing that exhibits a mix of CPU and I/O activity (Seismic). Experiments were conducted in a LAN to study the overhead from the user-level techniques, and also in an emulated WAN setup which captures the target environment for SGFS. Results from this analysis demonstrate that the solution achieves strong security with reasonable overhead, and a tradeoff can be made to balance the performance and security strength for the file systems. It also shows that SGFS can effectively hide high network latencies using disk caching and deliver efficient data access in wide-area environments, which substantially outperforms native NFS. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SC07 November 10-16, 2007, Reno, Nevada, USA (c) 2007 ACM 978-1-59593-764-3/07/0011…$5.00