BT Technol J Vol 18 No 1 January 2000 87 Self-organising data networks G Bilchev and S Olafsson In this paper we propose a distributed information system infrastructure and through simulation models show its huge potential. We also give arguments about the economic viability of the suggested infrastructure in the context of the Internet and identify the interested parties that can drive the deployment of the system. Essentially two different types of infrastructure are discussed, one based on data replication and another one based on caching. The difference between the two infrastructures is mainly in terms of their maintenance, ownership and management. 1. Introduction • reduced user perceived latency, • reduced bandwidth consumption on the backbone network, • load reduction on the origin server, • improved data availability through redundancy, • possibility of providing quality of service (QoS) guarantees to the end user. 2. Self-organising networks 2.1 Replication In the replication scenario the CPs would like to replicate their services to the edge of the network (i.e. at the ISP servers) so that a better QoS can be provided to the end users. Depending on the point-of-presence of the CP there will be certain ISPs that are more attractive than others. For example, a CP in the USA with good US backbone connections may only consider replication into European or Asian ISPs, provided that the CP offers services in these regions. Given the above scenario, we have built a mathematical model of the replication network. It consists of a number of nodes, each one being able to represent either an ISP, a CP or both which is the general case since a CP can be an ISP and vice versa. Assumptions are made about the average connection speed between the nodes. Furthermore, we make assumptions about the data request patterns and derive from it the server load by using standard queuing theory. We have assumed that the most important performance measure is the user perceived latency, since it also impacts the QoS that can be provided to the customers. Other measures of interest may include the incurred financial cost. Running the simulation model enables us to analyse different scenarios and for each one the average user- perceived latency. We adopt a view of the network as a whole and are interested in minimising the average latency rather than measuring the performance from the perspective of a single CP. In the present work we discuss two different adaptation algorithms for the optimisation of the average network performance. 2.2 Caching The model starts by analysing the current WWW usage patterns. We have identified that daily usage patterns typically consist of an underlying trend superimposed by a stochastic component. To model the underlying trend we have used a ‘superposition’ of periodic functions. Once the trend has been approximated, the stochastic component is modelled as a series of Brownian motion processes (i.e. random walks). N etwork caching and replication involve the storing of multiple copies of data objects in distributed locations throughout the network. Access to data is provided by these nearby copies, saving the need to go all the way to the original source. The results are: T he paper discusses in some detail the algorithms used for data replication and caching. We also introduce simulation models that have been developed, demonstrating the potential benefits efficient file distribution brings to content providers (CPs) and Internet service providers (ISPs). As both the algorithms are autonomous and dependent on the user request distribution they can be viewed as examples of self-organising networks.