Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines I. Sengor Altingovde 1 , Rifat Ozcan 1 , B. Barla Cambazoglu 2 , and ¨ Ozg¨ ur Ulusoy 1 1 Department of Computer Engineering, Bilkent University, Ankara, Turkey {ismaila,rozcan,oulusoy}@cs.bilkent.edu.tr 2 Yahoo! Research, Barcelona, Spain barla@yahoo-inc.com Abstract. Result caches are vital for efficiency of search engines. In this work, we propose a novel caching strategy in which a dynamic result cache is split into two layers: an HTML cache and a docID cache. The HTML cache in the first layer stores the result pages computed for queries. The docID cache in the second layer stores ids of documents in search results. Experiments under various scenarios show that, in terms of average query processing time, this hybrid caching approach outper- forms the traditional approach, which relies only on the HTML cache. Keywords: Search engines, query processing, result cache. 1 Introduction Result caching is a crucial mechanism employed in search engines to satisfy low response time and high throughput requirements under high query workloads [2]. Usually, a static result cache is filled by the result pages of queries that were frequent in the past. Additionally, a dynamic result cache is maintained to handle the burst in query traffic. The content of the result cache changes dynamically depending on the query stream. Each time the cache is full, an entry is evicted from the cache based on a certain replacement policy (e.g., LRU). A real life search engine might either split the available cache capacity between static and dynamic caches [3], or involve a sufficiently large dynamic cache that would almost never evict frequent queries, as if they were kept in a static cache. In design and evaluation of caching strategies, a traditionally used measure is the cache hit rate. Recently, some works have also taken into account the fact that the cost of a cache miss depends on the query, i.e., some queries require more computational resources to be answered [1,4]. These works have shown that it is better to tune a caching strategy according to the query processing cost incurred on the backend system, instead of the achieved hit rate alone. A typical entry in a dynamic result cache stores the HTML result page 1 generated as an answer to a query. A storage-wise profitable alternative to this 1 By HTML result page, we mean the textual content such as the URLs and snippets of the documents in the result page [3], but not the visual content in the page. P. Clough et al. (Eds.): ECIR 2011, LNCS 6611, pp. 510–516, 2011. c Springer-Verlag Berlin Heidelberg 2011