Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Bin Tan Department of Computer Science University of Illinois at Urbana-Champaign ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign ABSTRACT A major limitation of most existing retrieval models and systems is that the retrieval decision is made based solely on the query and document collection; information about the actual user and search context is largely ignored. In this paper, we study how to ex- ploit implicit feedback information, including previous queries and clickthrough information, to improve retrieval accuracy in an in- teractive information retrieval setting. We propose several context- sensitive retrieval algorithms based on statistical language models to combine the preceding queries and clicked document summaries with the current query for better ranking of documents. We use the TREC AP data to create a test collection with search context information, and quantitatively evaluate our models using this test set. Experiment results show that using implicit feedback, espe- cially the clicked document summaries, can improve retrieval per- formance substantially. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval models General Terms Algorithms Keywords Query history, query expansion, interactive retrieval, context 1. INTRODUCTION In most existing information retrieval models, the retrieval prob- lem is treated as involving one single query and a set of documents. From a single query, however, the retrieval system can only have very limited clue about the user’s information need. An optimal re- trieval system thus should try to exploit as much additional context information as possible to improve retrieval accuracy, whenever it is available. Indeed, context-sensitive retrieval has been identified as a major challenge in information retrieval research[2]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR’05, August 15–19, 2005, Salvador,Brazil. Copyright 2005 ACM 1-59593-034-5/05/0008 ...$5.00. There are many kinds of context that we can exploit. Relevance feedback [14] can be considered as a way for a user to provide more context of search and is known to be effective for improv- ing retrieval accuracy. However, relevance feedback requires that a user explicitly provides feedback information, such as specifying the category of the information need or marking a subset of re- trieved documents as relevant documents. Since it forces the user to engage additional activities while the benefits are not always ob- vious to the user, a user is often reluctant to provide such feedback information. Thus the effectiveness of relevance feedback may be limited in real applications. For this reason, implicit feedback has attracted much attention re- cently [11, 13, 18, 17, 12]. In general, the retrieval results using the user’s initial query may not be satisfactory; often, the user would need to revise the query to improve the retrieval/ranking accuracy [8]. For a complex or difficult information need, the user may need to modify his/her query and view ranked documents with many iter- ations before the information need is completely satisfied. In such an interactive retrieval scenario, the information naturally available to the retrieval system is more than just the current user query and the document collection – in general, all the interaction history can be available to the retrieval system, including past queries, informa- tion about which documents the user has chosen to view, and even how a user has read a document (e.g., which part of a document the user spends a lot of time in reading). We define implicit feedback broadly as exploiting all such naturally available interaction history to improve retrieval results. A major advantage of implicit feedback is that we can improve the retrieval accuracy without requiring any user effort. For ex- ample, if the current query is “java”, without knowing any extra information, it would be impossible to know whether it is intended to mean the Java programming language or the Java island in In- donesia. As a result, the retrieved documents will likely have both kinds of documents – some may be about the programming lan- guage and some may be about the island. However, any particular user is unlikely searching for both types of documents. Such an ambiguity can be resolved by exploiting history information. For example, if we know that the previous query from the user is “cgi programming”, it would strongly suggest that it is the programming language that the user is searching for. Implicit feedback was studied in several previous works. In [11], Joachims explored how to capture and exploit the clickthrough in- formation and demonstrated that such implicit feedback informa- tion can indeed improve the search accuracy for a group of peo- ple. In [18], a simulation study of the effectiveness of different implicit feedback algorithms was conducted, and several retrieval models designed for exploiting clickthrough information were pro-