A CONTENT-TYPE BASED EVALUATION OF WEB CACHE REPLACEMENT POLICIES F.J. González-Cañete, E. Casilari, A. Triviño-Cabrera Department of Electronic Technology, University of Málaga, Spain University of Málaga, E.T.S.I. Telecomunicación, Campus de Teatinos, 29071, Málaga, Spain {fgc,ecasilari,atc}@uma.es ABSTRACT In this paper, a study of the performance of six replacement policies taking into account only one content-type of documents each time (Application, Audio, Images, Text and Video) has been developed in order to implement a proxy cache that differences the type of traffic. The classical caching algorithms LRU, LFU and LFU-DA and the caching schemes specifically developed for Web documents GD-SIZE, GDSF and GD* have been studied. Using a trace log of a real proxy cache, a characterization of the main properties of the documents of each content-type has been performed. Finally, a trace driven simulation study of the performance of the six replacement policies has been developed for the traffic generated by each content-type considered. In that way we can conclude which are the replacement policies that better perform for each content-type and cache size. KEYWORDS Web caching, replacement policies, document content-types. 1. INTRODUCTION Internet and the World Wide Web (the Web) are in a continuous evolution and growth, therefore many efforts to optimize them have been developed. One of the most important optimization techniques is the Web proxy caching that store the documents requested by the users close to them. Since it was proposed in (Luotonen, 1997), Web proxy caching has been utilized to reduce the latency that the users perceive, the HTTP traffic as well as the servers load. After this original proposal of a Web proxy cache, many research activities have aimed to study and develop replacement policies (Poplipnig, 2003) (Balamash, 2004), algorithms for cache coherence (Krishmamurthy, 1999) and cache architectures (Busari, 2000) in order to improve the performance of the caching system. One of the main research lines is based on differencing the types of documents that are present in the Web (Images, Text, Video,…). Khayari proposed to store in the cache only the most frequently demanded document types (mpeg, gif, jpg, flash, html and plain) although his proposal did not outperform the cache performance (Khayari, 2005). In this paper we analyze the best replacement policy for each document type by means of simulations. This paper is organized as follows. Section 2 summarizes the trace processing and the statistical characterization of the workload based on the content-type and section 3 lists and evaluates the performance of a proxy cache that takes into account only one content-type of the downloaded document. Finally, Section 4 presents the main conclusions of this paper. 2. TRACE PROCESSING AND CHARACTERIZATION To evaluate the performance of a cache that only considers one type of document content type at a time, a workload trace that contains HTTP requests from a proxy of the IRCache project has been utilised (IRCache). This proxy is located in the Research Triangle Park (North Carolina, USA). The traces include requests from the 7 th to the 11 th of June 2004 generated by the Squid Web proxy cache software (Squid ISBN: 978-972-8924-30-0 © 2007 IADIS 90