Adapting information retrieval systems to user queries Giridhar Kumaran * , James Allan Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts Amherst, Amherst, MA 01003, USA Received 1 October 2007; accepted 20 December 2007 Abstract Users enter queries that are short as well as long. The aim of this work is to evaluate techniques that can enable infor- mation retrieval (IR) systems to automatically adapt to perform better on such queries. By adaptation we refer to (1) mod- ifications to the queries via user interaction, and (2) detecting that the original query is not a good candidate for modification. We show that the former has the potential to improve mean average precision (MAP) of long and short que- ries by 40% and 30% respectively, and that simple user interaction can help towards this goal. We observed that after inspecting the options presented to them, users frequently did not select any. We present techniques in this paper to deter- mine beforehand the utility of user interaction to avoid this waste of time and effort. We show that our techniques can provide IR systems with the ability to detect and avoid interaction for unpromising queries without a significant drop in overall performance. Ó 2008 Elsevier Ltd. All rights reserved. Keywords: Adaptive information retrieval; Query reformulation; Long queries 1. Introduction The quality of queries submitted to information retrieval (IR) systems directly affects the quality of search results generated (Croft & Thompson, 1987). In conveying complex information needs, users enter queries that would appear perfectly legitimate and understandable to a human being. Unfortunately, in a large number of cases such queries are not handled well by the search engine. While users generally have a model of their infor- mation need, they have little or no knowledge about how the underlying IR system works. This lack of knowl- edge is usually coupled with another unknown: the contents of the collection being searched. A disconnect thus exists between what users enter as queries and the ideal representation required to retrieve the documents they want (Nordlie, 1999). In this paper we are interested in two types of queries, those that are long and those that are short. Shorter queries are more pervasive than longer ones – especially in the web domain. The average query length is 0306-4573/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2007.12.006 * Corresponding author. Tel.: +1 413 658 7280. E-mail addresses: giridhar@cs.umass.edu (G. Kumaran), allan@cs.umass.edu (J. Allan). Available online at www.sciencedirect.com Information Processing and Management 44 (2008) 1838–1862 www.elsevier.com/locate/infoproman