Automatic Query Reformulation with Syntactic Operators to Alleviate Search Difficulty Huizhong Duan University of Illinois at Urbana- Champaign 201 N Goodwin Ave Urbana, IL 61801 USA duan9@illinois.edu Rui Li University of Illinois at Urbana- Champaign 201 N Goodwin Ave Urbana, IL 61801 USA ruili1@illinois.edu ChengXiang Zhai University of Illinois at Urbana- Champaign 201 N Goodwin Ave Urbana, IL 61801 USA czhai@cs.uiuc.edu ABSTRACT Modern search engines usually provide a query language with a set of advanced syntactic operators (e.g., plus sign to require a term’s appearance, or quotation marks to require a phrase’s appearance) which if used appropriately, can significantly improve the effectiveness of a plain keyword query. However, they are rarely used by ordinary users due to the intrinsic difficulties and userslack of corpora statistics. In this paper, we propose to automatically reformulate queries that do not work well by selectively adding syntactic operators. Particularly, we propose to perform syntactic operator-based query reformulation when a retrieval system detects users encounter difficulty in search as indicated by users’ behaviors such as scanning over top k documents without click-through. We frame the problem of automatic reformulation with syntactic operators as a supervised learning problem, and propose a set of effective features to represent queries with syntactic operators. Experiment results verify the effectiveness of the proposed method and its applicability as a query suggestion mechanism for search engines. As a negative feedback strategy, syntactic operator-based query reformulation also shows promising results in improving search results for difficult queries as compared with existing methods. Categories and Subject Descriptors H.3.3 [INFORMATION STORAGE AND RETRIEVAL]: Information Search and Retrieval query formulation General Terms Algorithms, Performance, Experimentation Keywords Query reformulation, syntactic operator, search difficulty. 1. INTRODUCTION Query languages of modern search engines usually include a set of advanced syntactic operators to supplement traditional keyword query [1, 2]. For instance, a necessity operator (plus sign) preceding a query term requires the term to be present in each relevant document; a phrase operator (a pair of quotation marks) imposes that relevant documents must contain the phrase consisting of the quoted terms. To distinguish from keyword query, we refer to a query with syntactic operators as a syntax query. For example, Figure 1a shows a keyword query, while Figure 1b and Figure 1c show two syntax queries which have the same set of terms as the keyword query shown in Figure 1a. For convenience of discussion, we further denote this type of syntactic operator-based reformulation as syntactic reformulation. If used appropriately, syntactic reformulation can be very effective for improving retrieval accuracy, turning an otherwise ineffective query to an effective one. Figure 1 shows an example of using syntactic reformulation to improve the top ranked results with a major US search engine 1 . As we can see from Figure 1a, none of the top ranked documents is relevant to the query. In contrast, in Figure 1b, by using the syntactic query with the necessity constraint on term “unix”, we are able to find two (2nd and 3rd) relevant documents out of the top three. This is because the search engine overlooked the term “unix” in the original query, which is typically due to the coarse estimation of term importance. By imposing the necessity constraint, the query emphasizes the importance of the term and re-ranks the results according to whether it is contained in each document. In Figure 1c, with an even more complicated syntactic reformulation, we further impose a phrase constraint on “default java”. This effectively eliminates the possible ambiguities of the query caused by matching terms separately. As a result, all the top three ranked documents are relevant documents. The proper use of syntactic operators not only clarifies user’s information need, but also gives clues to the retrieval system on how to optimize the search results. However, very few users make use of the syntactic operators in their daily search activities. Statistics 2 from a search engine query log show that only less than 0.5% queries used syntactic operators. This is either because users are unfamiliar with the semantics of the operator, or because they lack the appropriate knowledge and statistics to formulate working syntax queries. In this paper, we propose to help users take advantage of the rich query syntax by automatically formulating potentially effective syntax queries based on keyword queries. Particularly, we propose to automatically perform the query reformulation with syntactic operators when users encounter difficulty in search. Such 1 http://www.google.com 2 Based on a sample of MSN query log in 2006. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM’11, October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM 978-1-4503-0717-8/11/10...$10.00.