WWW Cache Modelling Toolbox George Bilchev, Chris Roadknight, Ian Marshall, and Sverrir Olafsson BT Research Laboratories, Martlesham Heath, Ipswich, Suffolk, IP5 3RE, UK {george.bilchev, sverrir.olafsson}@bt-sys.bt.co.uk {roadknic, marshall}@drake.bt.co.uk Abstract. This paper develops and implements a World Wide Web cache infrastructure model which is to be used for analysis of features that are otherwise difficult to get from existing log data or for evaluation of non- existing cache scenarios. A prominent feature of our model that differentiates it from other similar models is its dynamical aspect, which allows for the investigation of temporal features. Using the model we verify and quantify an observation made from real log data that the popularity of Web pages diversifies the higher we go in the cache hierarchy. We then use the model to predict the cache population dynamics in a hypothetical scenario of sufficiently large caches. 1. Introduction Proxy caching has become an established technique for enabling effective file delivery within the World Wide Web architecture [1][2]. The addition of file caching agents adds many positive features including robustness (by distributing files more widely), a possible reduction in total bandwidth requirements (by moving popular files near to the clients) and a reduction in pressure on origin servers, especially on those serving popular files. Understanding precise costs and benefits of inserting caches into the network is a highly desirable goal for network management and design. To gain an understanding of what affects a cache’s behaviour and performance it has been essential to analyse behaviour of existing WWW caches currently in operation [3][4][5]. This analysis gives us some information about the inter-relationships of cache metrics and possible causes of observed behaviour [6] but only covers caches in existing locations, serving existing communities. It is therefore highly desirable to be able to model cache behaviour so that non-existing cache scenarios can be evaluated. A cache enabled WWW modelling toolbox would undoubtedly be of use to network planners but also to many Internet researchers looking for a simple, flexible model to test theories with. In this paper we develop a WWW cache model that is easy to use, cheap to implement and fast to simulate. The model only requires a few simple input values and yet is realistic enough to verify observed data from real caches. We believe the model will be of particular interest to operator technical staff who are not PhDs and work under short time scales. 2. Previous Work There are two main types of WWW cache modelling approaches. The first type concerns modelling individual components such as generating representative Web traffic and feeding real caches in order to test and analyse them. Two well known examples of this approach are the Wisconsin Proxy Benchmark [7] and SURGE [8]. The second approach consists of mathematical modelling at a higher level of abstraction where the variables of interest usually represent average values. For example, [9] develops explicit formulas for the hit rate as a function of cache size of a single proxy using probability theory. Another example is [10] where the authors describe a model comprising various levels of a caching infrastructure. The advantage of the mathematical approach is that the models are relatively fast to simulate. The disadvantage is that often these models lack the desired detail, i.e., oversimplifying assumptions are made which might turn out to be significant. In this paper we have adopted the second approach since it allows for larger scale models that are cheap to implement and easy to simulate. 3. Modelling WWW Caches Previous mathematical models of WWW caching [9][10] mostly consider variables representing some aggregated average value, i.e., average distance of a WWW server from its clients, average document size, average number of requests made by a single client, etc. Although these models are not computationally very intensive to simulate, their modelling granularity makes