Integrating Parallel File Systems with Object-Based Storage Devices Ananth Devulapalli Ohio Supercomputer Center ananth@osc.edu Dennis Dalessandro Ohio Supercomputer Center dennis@osc.edu Pete Wyckoff Ohio Supercomputer Center pw@osc.edu Nawab Ali The Ohio State University alin@cse.ohio-state.edu P. Sadayappan The Ohio State University saday@cse.ohio-state.edu ABSTRACT As storage systems evolve, the block-based design of today’s disks is becoming inadequate. As an alternative, object- based storage devices (OSDs) offer a view where the disk manages data layout and keeps track of various attributes about data objects. By moving functionality that is tradi- tionally the responsibility of the host OS to the disk, it is possible to improve overall performance and simplify man- agement of a storage system. The capabilities of OSDs will also permit performance improvements in parallel file sys- tems, such as further decoupling metadata operations and thus reducing metadata server bottlenecks. In this work we present an implementation of the Parallel Virtual File System (PVFS) integrated with a software em- ulator of an OSD and describe an infrastructure for client access. Even with the overhead of emulation, performance is comparable to a traditional server-fronted implementa- tion, demonstrating that serverless parallel file systems us- ing OSDs are an achievable goal. 1. INTRODUCTION The ability of current storage systems to supply the I/O data rates needed by high-end computing applications has been insufficient for many years. While Moore’s Law shows how processing elements are becoming faster over time due to increased chip densities, performance improvements in magnetic storage occur at a much slower rate. To meet the throughput and reliability demands of applications, parallel storage systems composed of commodity disks are used. The use of commodity components in these systems lowers the cost, but adds the expense of more complex implementation, management and client overhead. It is an opportune moment to consider redefining the way in which storage is treated in a computing system. Cur- rently, a disk drive is treated as a “dumb” peripheral. The Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SC07 November 10-16, 2007, Reno, Nevada, USA Copyright 2007 ACM 978-1-59593-764-3/07/0011 ...$5.00. operating system instructs the disk what data to write, and in what location. Further, the operating system does not expose any information about how the data is related. How- ever, modern disk drives are actually quite complex. They perform write buffering, block remapping, command reorder- ing, selective read-ahead and other operations on their own. It is the historical mode of interaction with storage that in- hibits major improvements in performance, scalability and manageability. With the recent introduction of an ANSI standard for a new interface to storage devices [34], the semantic level of communication with a disk drive becomes significantly higher. The standard specifies an object-based interface to storage rather than a block-based interface, among other important features. Unlike block-based devices, an object- based storage device (OSD) is aware of the logical data or- ganization as defined by users. It manages all the internal layout decisions for data and keeps a variable set of meta- data for each object. These features radically change the role of a storage element in a computing system. Rather than being relatively passive, as with block-based devices, an OSD can take a more active role in managing all aspects of storage, including data, metadata, security and reliability. High-performance computing environments stress storage systems more heavily due to the specialized workloads seen there. Data is often streamed in large blocks unlike the small random accesses used in desktop environments. Parallel ap- plications also tend to access storage cooperatively, allowing for better overall throughput. However, the metadata load in parallel file systems can sometimes be a major bottle- neck [32, 31, 23]. Thus parallel file system designers are faced with a new and unique set of challenges for deploying OSDs. Overcoming these challenges will result in improved performance and reduced component count. While the oper- ation set offered by an OSD is richer than that of traditional block-based devices, it does not provide all of the function- ality desired by a parallel file system. Many parallel file systems [4, 20, 35] already represent file data as objects. However, the storage devices them- selves still maintain a simple block-based view of the storage medium. All decisions related to data layout and organiza- tion are the responsibility of the file system. As such these implementations are unable to leverage the capabilities of OSDs. Our work instead aims to integrate parallel file sys- tems with true object-based storage devices. However, since the OSD specification is relatively new, there are no readily