Design and Implementation of a Metadata-Rich File System Sasha Ames , Maya B. Gokhale, and Carlos Maltzahn University of California, Santa Cruz Lawrence Livermore National Laboratory Abstract Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these prob- lems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class ob- jects. In contrast to hierarchical file systems and rela- tional databases, QFS defines a graph data model com- posed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for search- ing the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior per- formance on normal file metadata operations. 1 Introduction The annual creation rate of digital data, already 468 ex- abytes in 2008, is growing at a compound annual growth rate of 73%, with a projected 10-fold increase over the next five years [19, 18]. Sensor networks of growing size and resolution continue to produce ever larger data streams that form the basis for weather forecasting, cli- mate change analysis and modeling, and homeland secu- rity. New digital content, such as video, music, and docu- ments, also add to the world’s digital repositories. These Application File System RDBMS Data Rich Metadata API: POSIX API: SQL Application File System Data API: POSIX + Query Language Rich Metadata Query Processor Traditional Architecture Metadata-Rich File System Query Processor Figure 1: The Traditional Architecture (left), to manage file data and user-defined metadata, places file data in conventional file systems and and user-defined metadata in databases. In contrast, a metadata-rich file system (right) integrates storage, access, and search of struc- tured metadata with unstructured file data. data streams must be analyzed, annotated, and searched to be useful; however, currently used file system archi- tectures do not meet these data management challenges. There are a variety of ad hoc schemes in existence to- day to attach user-defined metadata with files, such as a distinguished suffix, encoding metadata in the filename, putting metadata as comments in the file, or maintaining adjunct files related to primary data files. Application developers needing to store more complex inter-related metadata typically resort to the Traditional Architecture approach shown on the left in Figure 1, storing data in file systems as a series of files and managing annotations and other metadata in relational databases. An example of this approach is the Sloan Digital Sky Survey [39, 40], in which sky objects and related metadata are stored in a Microsoft SQL Server database and refer to the raw data stored in regular file systems by absolute pathname. This approach likely emerged because of file sys- tems’ ability to store very large amounts of data, com- bined with databases’ superiority to traditional file sys- tems in their ability to query data. Each complemented 1