Towards an Economy-Based Optimisation of File Access and Replication on a Data Grid Mark Carman, Floriano Zini, Luciano Serafini ITC-irst Via Sommarive 18 38050 Povo (Trento), Italy {carman, zini, serafini}@itc.it Kurt Stockinger CERN Geneva, Switzerland Kurt.Stockinger@cern.ch Abstract We are working on a system for the optimised access and replication of data on a Data Grid. Our approach is based on the use of an economic model that includes the actors and the resources in the Grid. Optimisation is ob- tained via interaction of the actors in the model, whose goals are maximising the profits and minimising the costs of data resource management. In the system, local optimi- sation results in global optimisation through emergent mar- ketplace behaviour. In this paper we give an overview of our model and present part of the complex economic rea- soning required to support this desired marketplace inter- action model. 1. Introduction In a typical Grid environment, where many users are sharing limited amounts of computing and storage re- sources, the optimisation of resource usage is very impor- tant in order to guarantee reasonable execution time and/or cost of users’ tasks as well as fairness among them. Such environments are typically highly heterogenous and the re- sources themselves are dynamic in nature. In the case of the so-called Data Grids, there is the added problem of needing to manage vast quantities of data (up to several Petabytes) [5]. Here, the main challenge is the improvement of data access efficiency given the limited number and size of stor- age devices available, which in turn constrains the amount of data replication that can be carried out. In a Data Grid a user typically submits a job to the Grid from her workstation, which is located at a particular site on the Grid, and requires that the job be executed as fast as pos- sible 1 . To execute, a job basically requires three kinds of re- 1 Note that for simplicity we assume that a job is atomic and can thus not be decomposed into subjobs. sources: computational resources, data resources, and net- work resources. Ideally, a Grid optimisation service should be able to manage the usage of these resources in order to bring the needs of a single user into agreement with the de- mands of the whole community of Grid users. Optimisation should be carried out based on the status of Grid resources (workload and features of computation sites, location of data, network load) and should result in the allocation of a convenient site for job execution, as well as the alloca- tion of a convenient replica of the job’s input data (possibly involving dynamic replication of data between Grid sites). In this paper we focus on a particular aspect of optimi- sation and deal primarily with the problem of optimising the replication of data in a Grid environment, that is, with deciding when and where to create and delete replicas of data files. The aim here is to minimise the overall cost of file access on the Grid in the “long-term” [3], given a finite amount of storage resources. We do however, also deal with the complementary problem, which is that of selecting the optimal replicas of data for use by a job currently executing within the Grid environment, (which we referred to as the “during job execution” optimisation in [3]). We do not tackle the problem of job scheduling on the Grid, i.e. the problem of deciding where and when to schedule jobs for execution. We assume that jobs are dis- patched for execution to different sites on the Grid by some scheduling system, which uses knowledge of the available computational, data, and network resources to make “ratio- nal” scheduling decisions. Even though job scheduling and replica selection are related, for simplicity of our model we make the assumption that our optimisation starts when the scheduling decision has already been made and the job has started (or is about to start) execution on a particular site. We propose a fully distributed optimisation of data ac- cess and replication, based on an economic model for the interaction of different optimisation units at each node/site on the network. The main focus is on optimising local re-