Exploiting templates to scale consistency maintenance in edge database caches Khalil Amiri IBM Canada Toronto, ON kamiri@ca.ibm.com Sara Sprenkle Duke University Durham, NC sprenkle@cs.duke.edu Renu Tewari IBM Almaden Research Center San Jose, CA tewarir@us.ibm.com Sriram Padmanabhan IBM T.J. Watson Research Center Hawthorne, NY srp@us.ibm.com Abstract Semantic database caching is a self-managing ap- proach to dynamic materialization of “semantic” slices of back-end databases on servers at the edge of the network. It can be used to enhance the per- formance of distributed Web servers, information inte- gration applications, and Web applications offloaded to edge servers. Such semantic caches often rely on update propagation protocols to maintain consistency with the back-end database system. However, the scal- ability of such update propagation protocols continues to be a major challenge. In this paper, we focus on the scalability of update propagation from back-end databases to the edge server caches. In particular, we propose a publish-subscribe like scheme for aggregat- ing cache subscriptions at the back-end site to enhance the scalability of the filtering step required to route up- dates to the target caches. Our proposal exploits the template-rich nature of Web applications and promises significantly better scalability. In this paper, we de- scribe our approach, discuss the tradeoffs that arise in its implementation, and estimate its scalability com- pared to naive update propagation schemes. The author performed this work while at the IBM T. J. Watson Research Center 1 Introduction The performance and scalability of Web applica- tions continues to be a critical requirement for con- tent providers. Traditionally, static caching of HTML pages on edge servers has been used to help meet this requirement. However, with a growing fraction of the content becoming dynamic and requiring access to the back-end database, static caching is by-passed as all the dynamically generated pages are marked un- cachable by the server. Dynamic data is typically served using a 3-tiered web serving architecture consisting of a web server, an application server and a database; data is stored in the database and is accessed on-demand by the ap- plication server components and formatted and deliv- ered to the client by the web server. In more recent architectures, the edge server (which includes client- side proxies, server-side reverse proxies, or caches within a content distribution network(CDN) [2]) acts as an application server proxy by offloading applica- tion components (e.g., JSPs, servlets, EJBeans) to the edge [12, 7]. Database accesses by these edge applica- tion components, however, are still retrieved from the back-end server over the wide area network. To accelerate edge applications by eliminating wide-area network transfers, we have recently pro- posed and implemented DBProxy, a database cache that dynamically and adaptively stores structured data at the edge [4]. The cache in this scenario is a persis- tent edge cache containing a large number of changing