Replication algorithms for the World-Wide Web Fathi Tenzakhti a , Khaled Day b , M. Ould-Khaoua a, * a Department of Computing Science, University of Glasgow, Glasgow G12 8RZ, UK b Department of Computer Science, Sultan Qaboos University, P.O. Box 36, Al-Khod, 123 Muscat, Oman Received 23 April 2002; received in revised form 5 February 2003; accepted 10 December 2003 Available online 2 March 2004 Abstract This paper addresses the two fundamental issues in replication, namely deciding on the number and placement of the replicas and the distribution of requests among replicas. We first introduce a centralized algorithm for replicating objects that can keep a balanced load on sites. In order to meet the requirement due to the dynamic nature of the Internet traffic and the rapid change in the access pattern of the World-Wide Web (Web), we also propose a distributed algorithm where each site relies on some collected information to decide on where to replicate and migrate objects to achieve good performance. The performance of the proposed algorithms is evaluated experimentally and a comparison of their measured performance is presented. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Replication; Centralized/distributed replication; Load balancing; Web; Performance analysis 1. Introduction With the growth of the World Wide Web (Web) traffic, there is a disproportional increase in client requests to popular web sites. System administra- tors are constantly faced with the need to scale up site capacity. There are two different ways to achieving this: One way consists on enhancing the resources such as replacing the actual hosts with larger and more powerful ones. Unfortunately, this approach is expensive and complicated by the absence of a central authority responsible for system administration. A more promising ap- proach to solve this problem is to manage avail- able resources in a more appropriate fashion. A key idea is to provide replicated sites at different locations reducing the number of object retrieval operations over excessive distances and balancing the load of popular sites [3,4,7]. This will reduce the cost and improve the overall response time over the Web. The replication of objects is a well-known technique commonly used to address the scalabil- ity problem of popular sites. Replication improves object availability in the presence of site crashes and network failures [1,5,18], and increases effi- ciency by allowing operations to use local replicas instead of remote ones [2,4]. Furthermore, the load * Corresponding author. Tel.: +44-141-330-6056; fax: +44- 141-330-4913. E-mail addresses: fathit@squ.edu.om (F. Tenzakhti), kday@computer.org (K. Day), mohamed@dcs.gla.ac.uk (M. Ould-Khaoua). 1383-7621/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2003.12.003 Journal of Systems Architecture 50 (2004) 591–605 www.elsevier.com/locate/sysarc