Computer Networks 33 (2000) 33–49 www.elsevier.com/locate/comnet SPREAD: scalable platform for reliable and efficient automated distribution Pablo Rodriguez *.a,1 , Sandeep Sibal b,2 a Institut EURECOM, 2229 Route des Cretes, 06904 Sophia Antipolis Cedex, France b AT&T Labs Research, B-129, 180 Park Avenue, Florham Park, NJ 07932, USA Abstract We introduce SPREAD — a new architecture for distributing and maintaining up-to-date Web content that simultane- ously employs three different mechanisms: client validation, server invalidation, and replication. Proxies within SPREAD self-configure themselves to form scalable distribution hierarchies that connect the origin servers of content providers to clients. Each proxy autonomously decides on the best mechanism based on the object’s popularity and modification rates. Requests and subscriptions propagate from edge proxies to the origin server through a chain of intermediate proxies. Invalidations and replications travel in the opposite direction. SPREAD’s network of proxies automatically reconfigures when proxies go down or come up, or when new ones are added. The ability to spontaneously form hierarchies is based on a modified transparent proxying mechanism, called translucent proxying, that sanitizes transparent proxying. It allows proxies to be placed in an ad-hoc fashion anywhere in the network — not just at focal points within the network that are guaranteed to see all the packets of a TCP connection. In this paper we (1) describe the architecture of SPREAD, (2) discuss how proxies determine which mechanism to use based on local observations, and (3) use a trace-driven simulation to test SPREAD’s behavior in a realistic setting. 2000 Published by Elsevier Science B.V. All rights reserved. Keywords: Content distribution; Consistency; Automated; Hierarchy; Caching; Replication 1. Introduction Due to the explosive growth of the World Wide Web, internet service providers (ISPs) throughout the world are installing proxy caches to reduce user perceived latency as well as bandwidth consumption. Such proxy caches are under the control of the ISP, and usually cache content for its client community, irrespective of the origin server. These proxy caches Ł Corresponding author. E-mail: rodrigue@eurecom.fr 1 During the period of this work, he was at AT&T research labs as an intern. 2 E-mail: sibal@research.att.com are often called forward proxy caches to distinguish them from reverse proxy caches, which we discuss next. More recently, several vendors, such as Akamai [1] and Sandpiper [17] have begun offering proxy- based solutions to content providers, as opposed to ISPs. The business model here is that improving a user’s browsing experience, is not only in the ISP’s interest, but in the content provider’s interest as well. This is becoming increasingly important as the num- ber of content providers multiply and compete for the attention of end users. Proxy caches used in such a scenario are often called reverse proxy caches, to underline the fact that they are controlled by and 1389-1286/00/$ – see front matter 2000 Published by Elsevier Science B.V. All rights reserved. PII:S1389-1286(00)00086-4