A Scalable Autonomous Replica Management Framework for Grids
Vladimir Vlassov, Dong Li
Dept of Electronic, Computer and
Software Systems,
School for Information and
Communication Technology,
Royal Institute of Technology (KTH)
Stockholm, Sweden
vladv @kth.se, dongl@ kth.se
Konstantin Popov
Swedish Institute of Computer
Science (SICS)
Kista, Sweden
kost@sics.se
Seif Haridi
Dept of Electronic, Computer and
Software Systems,
School for Information and
Communication Technology,
Royal Institute of Technology (KTH)
Stockholm, Sweden
seif@kth.se
Abstract
Data replication can reduce access time and
improve fault tolerance and load balancing. Typical
requirements for a replica management system include
an upper bound on replica Round Trip Time,
scalability, reliability, self-management and self-
organization, and ability to maintain consistency of
mutable replicas. This article presents the design and a
prototype implementation of a scalable, autonomous,
service-oriented replica management framework for
Globus Toolkit Version 4 using DKS. DKS is a
structured peer-to-peer middleware. Grid nodes are
integrated into a P2P network. The framework uses the
ant metaphor and techniques of multi-agent systems for
collaborative replica selection. We propose also a
complimentary “background” service that collects
access statistics and optimizes replica placement based
on access pattern and replica lifetimes statistics. We
have tested and profiled the prototype.
1. Introduction
The Grid aims at secure integration of
heterogeneous computing resources in a standard and
uniform way. Sharing of computing facilities is
provided by Computational Grids, and Data Grids
support applications addressing large amount of data
that can be distributed and replicated over the Globe.
Managing storage facilities and data replicas in large
scale and volatile Data Grids is a challenging task.
A replica management system determines locations
of replicas and keeps track of them, and maintains
replica consistency. It should be scalable and optimize
replica placement in order to reduce access time and
This research work is carried out under the FP6 Network of
Excellence CoreGRID funded by the European Commission
(Contract IST-2002-004265).
communication costs, and maintain a required level of
consistency of mutable replicas [1]. Typical replica
management systems (e.g. [2][3]) consist of:
- Information services such as the replica location
service (RLS), user access history, and a metadata
catalog. Some services, in particular RLS, must
provide good scalability and fault-tolerance.
- Data transfer service provides for file transfer
among collaborating Grid nodes.
- Security infrastructure, including authentication and
authorization for remote users, and secure
communication and data transfer.
- Data consistency management service can have
different requirements depending on the application
and data access patterns. In many scientific
applications the datasets are read-only and therefore
replicas are consistent automatically. Applications
with mutable data should be allowed to choose an
optimum level of replica consistency.
Our research aims at building a scalable higher-
level replica management system that provides
optimized replica selection and placement. Our system
prototype is implemented using the Globus Toolkit 4
(GT4) [4]. Our framework utilizes the DKS
(Distributed K-ary System) Peer-to-Peer middleware
[5][6] that allows to organize Grid nodes into self-
organizing P2P overlay networks. The GT4 GridFTP
[7] is used for data transfer. Our framework contains
two specially developed components: the DKS Replica
Location Service (DKSRLS) built on the DKS P2P
network, and the Node Location Component (NLC)
based on GT4 WS Monitor and Discovery Systems
(MDS) aggregator framework. The framework also
provides for a data consistency mechanism.
The dynamicity of the Grid environment presents
challenges for replica management. In order o simplify
the work of Grid users and administrators on replica
management, consistency maintenance, and to improve
reliability, a replica management system must be
IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing (JVA'06)
0-7695-2643-8/06 $20.00 © 2006