Analysis of Caching Mechanisms from Sporting Event Web Sites Zhen Liu, Mark Squillante, Cathy Xia, S.Yu, Li Zhang, Naceur Malouch, and Paul Dantzig IBM Research Division Thomas J. Watson Research Center Yorktown Heights, NY 10598 {zhen, mss, hhcx, shuzheng, zhangl, nmmalouc, pauldant}@watson.ibm.com Abstract. Caching mechanisms are commonly implemented to improve the user experience as well as the server scalability at popular web sites. With multi-tier, geographically distributed caches, it is often difficult to quantify the benefit pro- vided by each tier of caches. In this paper we present and analyze the design of a web serving architecture that has been successfully used to host a number of recent, popular sporting event web sites with two tiers of caches. Special mecha- nisms are incorporated in this design that allow us to infer the cache performance at the middle-tier of reverse-proxy caches. Our results demonstrate a very high hit ratio (i.e., around 90%) for the reverse-proxy caches employed in this web serving architecture, which is sustained throughout the day and across all geographical regions being served. This is primarily due to system design mechanisms that allow almost all of the dynamic content to be cached, as well as to a significantly larger locality of reference among the users of sporting event web sites than that found in other web environments. These mechanisms also make it possible for us to separate the true user request patterns at the page level from any additional requests induced by the server architecture and implementation. 1 Introduction With the growing popularity of the World Wide Web, a dominant amount of information and services are delivered to many people around the world from web sites of various companies and organizations. The techniques used to handle user requests for informa- tion and services at such web sites must provide levels of performance and scalability that can accommodate the growth and evolution of these environments. A particularly challenging and interesting class of web sites are those that support sporting events that are of interest to people all over the world. These web sites tend to have some of the highest certified peak and sustained hit rates, with each event setting new records over previous events; e.g., see [5,1]. Moreover, such web sites must provide the latest information about various aspects of the sporting event that are constantly changing, and thus much of the content being served by these sites is dynamic. This in particular makes the problem of effectively caching the web content, in order to handle the high-volume request rates, especially difficult. To address this problem, we present the design of a web serving architecture with features that make it possible to serve dynamic content at the performance level of serving A. Jean-Marie (Ed.): ASIAN 2002, LNCS 2550, pp. 76–86, 2002. c Springer-Verlag Berlin Heidelberg 2002