Design and Implementation of a Metadata-Rich File System Sasha Ames ⋆ †, Maya B. Gokhale†, and Carlos Maltzahn ⋆ ⋆ University of California, Santa Cruz †Lawrence Livermore National Laboratory Abstract Despite continual improvements in the performance and reliability of large scale ﬁle systems, the management of user-deﬁned ﬁle system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional ﬁle systems, while related, application-speciﬁc metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inﬂexible system operation. To address these prob- lems, we have developed the Quasar File System (QFS), a metadata-rich ﬁle system in which ﬁles, user-deﬁned attributes, and ﬁle relationships are all ﬁrst class ob- jects. In contrast to hierarchical ﬁle systems and rela- tional databases, QFS deﬁnes a graph data model com- posed of ﬁles and their relationships. QFS incorporates Quasar, an XPATH-extended query language for search- ing the ﬁle system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior per- formance on normal ﬁle metadata operations. 1 Introduction The annual creation rate of digital data, already 468 ex- abytes in 2008, is growing at a compound annual growth rate of 73%, with a projected 10-fold increase over the next ﬁve years [19, 18]. Sensor networks of growing size and resolution continue to produce ever larger data streams that form the basis for weather forecasting, cli- mate change analysis and modeling, and homeland secu- rity. New digital content, such as video, music, and docu- ments, also add to the world’s digital repositories. These Application File System RDBMS Data Rich Metadata API: POSIX API: SQL Application File System Data API: POSIX + Query Language Rich Metadata Query Processor Traditional Architecture Metadata-Rich File System Query Processor Figure 1: The Traditional Architecture (left), to manage ﬁle data and user-deﬁned metadata, places ﬁle data in conventional ﬁle systems and and user-deﬁned metadata in databases. In contrast, a metadata-rich ﬁle system (right) integrates storage, access, and search of struc- tured metadata with unstructured ﬁle data. data streams must be analyzed, annotated, and searched to be useful; however, currently used ﬁle system archi- tectures do not meet these data management challenges. There are a variety of ad hoc schemes in existence to- day to attach user-deﬁned metadata with ﬁles, such as a distinguished sufﬁx, encoding metadata in the ﬁlename, putting metadata as comments in the ﬁle, or maintaining adjunct ﬁles related to primary data ﬁles. Application developers needing to store more complex inter-related metadata typically resort to the Traditional Architecture approach shown on the left in Figure 1, storing data in ﬁle systems as a series of ﬁles and managing annotations and other metadata in relational databases. An example of this approach is the Sloan Digital Sky Survey [39, 40], in which sky objects and related metadata are stored in a Microsoft SQL Server database and refer to the raw data stored in regular ﬁle systems by absolute pathname. This approach likely emerged because of ﬁle sys- tems’ ability to store very large amounts of data, com- bined with databases’ superiority to traditional ﬁle sys- tems in their ability to query data. Each complemented 1