Computer Networks and ISDN Systems 30 (1998) 2203–2209 A filtering algorithm for Web caches Michal Kurcewicz L; 1 , Wojtek Sylwestrzak 1 , Adam Wierzbicki 1 Interdisciplinary Center forMathematical and Computational Modeling, University of Warsaw, ul. Pawinskiego 5a, 02-106 Warsaw, Poland Abstract Most proxies use an aggressive caching policy: they cache all objects that can be cached. This policy has the advantage that given sufficient disk space it maximizes the hit rate. However, maximizing hit rate does not necessarily maximize proxy cache performance. Cache administrators often report that disks are the main bottleneck of busy cache servers. Overloaded disks increase service time, which results in overall lower performance of the proxy. The high disk load is caused mainly by writing all cacheable objects to disk. As an alternative caching policy, we propose the use of filtering algorithms. These algorithms filter out from the request stream objects that are likely to be accessed in the future. Only these objects are cached, others are forwarded to clients without making a local copy. We present a filtering algorithm that bases on sharing of origin servers among clients. We show that our approach gives a significantly lower disk load while maintaining high hit rates. 1998 Elsevier Science B.V. All rights reserved. Keywords: Proxy cache; Performance analysis 1. Introduction Most past work on caching policies and object replacement algorithms was focused on maximizing hit rate. This was because the hit rate was seen as the best measure to evaluate the benefits of caching. However, maximizing hit rate does not necessarily maximize proxy cache performance. A cache spends much time on disk operations, especially on disk writes. To maximize hit rate, proxies cache all objects that can be cached. Such a policy, as we shall demonstrate, generates a high load on disks. Overloaded disks increase service time, which results in overall lower performance of the proxy. L Corresponding author. 1 E-mail: {mkur,wojsyl,adamw}@icm.edu.pl. It is common knowledge that cache hit rates sel- dom exceed 50%. This is caused by the large amount of unpopular and moderately popular objects — i.e. objects that are requested only once or at most a few times over the observation period. In most cases, such objects are never accessed after they were placed in the cache. The aim of this work is to demonstrate that caching unpopular objects can be avoided and that doing so may improve overall cache performance. The rest of the paper is organized as follows. In the next two sections, we describe the traces used in our analysis and present a model of a proxy cache. Section 4 describes our trace-driven cache simula- tor. In Section 5, we present a filtering algorithm that places in the cache only objects that are likely to be accessed in the future. We also analyze its performance. Section 6 concludes the paper. 0169-7552/98/$ – see front matter 1998 Elsevier Science B.V. All rights reserved. PII:S0169-7552(98)00241-4