41 An XML-based Semantic Description of Distributed File Systems Sabin-Corneliu Buraga Faculty of Computer Science, “A.I.Cuza” University of Iaşi, Romania busaco@infoiasi.ro – http://www.infoiasi.ro/~busaco Abstract The actual modern operating systems must incorporate a variety of Internet services, especially World-Wide Web facilities to access distributed resources using file systems mechanisms. In this paper we present a high-level model describing a general distributed file system. The proposed description is based on Resource Description Framework (RDF) recommendation of the World- Wide Web Consortium – a general purpose XML- based technology that enables the semantic description of resources on the Web. To represent the RDF statements about various file characteristics, an XML-based language – Extensible File Properties Markup Language (XFiles) – is presented. 1. Introduction A distributed system is a collection of loosely coupled computers interconnected by a communication network. From the point of view of a specific computer in a distributed system, the rest of the machines (also known as hosts) and their respective resources are remote, whereas its own resources are local [10, 12]. Important part of a distributed operating system, a file system provides file services to clients (other hosts of the network). A client interface for a file service is formed by a set of primitives, called file operations, such as open a file, remove a file, read from a file, write to a file, and so on. A distributed file system [7] is a file system whose clients, servers, and storage devices are dispersed among the interconnected computers of a distributed system. In practice, the concrete configuration and implementation of a distributed file system may vary and it is difficult to determine the best implementation. A distributed file system can be implemented as part of a distributed operating system or by a software layer whose primary function is to manage the communication between conventional operating systems and file systems. Some examples of distributed file systems are Sun's Network File System (NFS) build on Remote Procedure Call mechanism [5, 12] – broadly used on Unix-like systems –, Prospero – an Internet- compatible virtual system model based on Uniform Resource Identifiers (URIs) –, or Coda – an experimental file system developed at Carnegie Mellon University [7, 12]. The paper proposes a high-level description of a virtual (distributed) file system using Resource Description Framework (RDF) [3, 8], a model for processing metadata. RDF provides interoperability between applications that exchange machine- understandable information on the World-Wide Web. The RDF is intended to be used to capture and express the conceptual structure of information offered in the Web, in order to build the infrastructure for Berners-Lee's Semantic Web [1]. One of the major goals of RDF is to make it possible to specify semantics for data based on Extensible Markup Language (XML) [2, 3, 15] in a standardized, platform-independent, and object- oriented manner. RDF can be used in resource discovery, in cataloging activities, by intelligent software agents, in content rating, in describing collections of data, etc. The proposed RDF model can be applied for a particular distributed file system. To illustrate some specific issues we choose the Unix file system structure. For expressing various file properties, we present an XML-based language called Extensible File Properties Markup Language (XFiles language) [4]. The elements of XFiles language will be used to specify RDF statements about the components of a distributed file system or about the relationship between these components. Also, the proposed RDF description can be used to formulate high-level assertions about main characteristics of a distributed file system or relations between Web resources, in a standardized and platform- and implementation-independent manner. 2. File Systems Most visible aspect of an operating system, the file system consists of two distinct parts: the collection of the actual files, each containing related