DFSgc: Distributed File System for Multipurpose Grid Applications and Cloud Computing Carlos de Alfonso 1 , Miguel Caballer 1 , José V. Carrión 1 , Vicente Hernández 1 1 Instituto ITACA - Universidad Politécnica de Valencia email: [calfonso,micafer,jocarbur,vhernand]@itaca.upv.es Abstract Grid Computing currently needs the support for managing huge quantities of storage. Most of Grid deployments only provide local storage support and lack essential features as decentralization of the file catalog. This paper explains the design of a distributed file system whose aim is to provide a virtual storage for multipurpose applications. A complete distributed catalogue system avoids faults so the entire catalog of files is available all the time. The idea is to allow users and applications to manage files like a local storage system hiding the physical addressing. An intelligent replication system uses heuristic techniques to bring data closer to the Grid applications so the applications avoid investing time in transferences. 1 Introduction Grid Computing has evolved and currently needs the support for managing huge quantities of storage. Most of Grid deployments only provide local storage support and the applications may guess where the data files are effectively stored and transfer them where they are needed. There are some approaches of virtualization of the data storage such as EGEE’s Replica Service [1], LCG File Catalog [2], Replica Location System from Globus and DataGrid [3], Hadoop Distributed File System [4] or Microsoft’s Distributed File System [5]. None of them has been consolidated by the community, because they omit essential features such as Decentralization of the file catalog, in order to avoid a central server who manages the entire catalogue of files. Seamless bringing the data closer to where it is needed or automating an intelligent replication. Creating a uniform namespace, as current systems assume that applications will know how files are called and organized. Most of these file systems are created with specific purposes, as many of them have been created as a part of a project. So they focus in the specific needs of the data which is managed in the main project. In the last years Grid computing has achieved some kind of stability, and its usage has been generalized in scientific environments for multipurpose applications. Also the advent of Cloud computing has exposed the need of ubiquitous storage systems which may able the applications to seamless distribute data files without the need of creating specific networks, storage systems or use artifices to take profit of existing systems. There is a need of having a disposal of infrastructure services which provide