Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 666–676, Jeju Island, Korea, 12–14 July 2012. c 2012 Association for Computational Linguistics Learning Lexicon Models from Search Logs for Query Expansion Jianfeng Gao Microsoft Research, Redmond Washington 98052, USA jfgao@microsoft.com Shasha Xie Educational Testing Service, Princeton New Jersey 08540, USA sxie@ets.org Xiaodong He Microsoft Research, Redmond Washington 98052, USA xiaohe@microsoft.com Alnur Ali Microsoft Bing, Bellevue Washington 98004, USA alnurali@microsoft.com Abstract This paper explores log-based query expan- sion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retriev- al performance but also outperform two state-of-the-art log-based QE methods. 1 Introduction Term mismatch is a fundamental problem in Web search, where queries and documents are com- posed using different vocabularies and language styles. Query expansion (QE) is an effective strate- gy to address the problem. It expands a query is- sued by a user with additional related terms, called expansion terms, so that more relevant documents can be retrieved. In this paper we explore the use of clickthrough data and translation models for QE. We select ex- pansion terms for a query according to how likely it is that the expansion terms occur in the title of a document that is relevant to the query. Assuming that a query is parallel to the titles of documents clicked for that query (Gao et al. 2010a), three lex- icon models are trained on query-title pairs ex- tracted from clickthrough data. The first is a word model that learns the translation probability be- tween single words. The second model uses lexi- calized triplets to incorporate word dependencies for translation. The third is a bilingual topic model, which represents a query as a distribution of hid- den topics and learns the translation between a query and a title term at the semantic level. We will show that the word model provides a rich set of expansion candidates while the triplet and topic models can effectively select good expansion terms, and that a ranker-based QE system which incorporates all three of these models not only sig- nificantly improves Web search result but outper- forms other log-based QE methods that are state- of-the-art. There is growing interest in applying user logs to improve QE. A recent survey is due to Baeze- Yates and Ribeiro-Neto (2011). Below, we briefly discuss two log-based QE methods that are closest to ours and are re-implemented in this study for comparison. Both systems use the same type of log data that we used to train the lexicon models. The term correlation model of Cui et al. (2002; 2003) is to our knowledge the first to explore query- document relations for direct extraction of expan- sion terms for Web search. The method outper- forms traditional QE methods that do not use log data e.g. the local analysis model of Xu and Croft (1996). In addition, as pointed out by Cui et al. (2003) there are three important advantages that make log-based QE a promising technology to im- prove the performance of commercial search en- gines. First, unlike traditional QE methods that are based on relevance feedback, log-based QE derives expansion terms from search logs, allowing term correlations to be pre-computed offline. Compared to methods that are based on thesauri either com- piled manually (Prager et al. 2001) or derived au- 666