Information Processing and Management 54 (2018) 1–13
Contents lists available at ScienceDirect
Information Processing and Management
journal homepage: www.elsevier.com/locate/infoproman
A Prospect-Guided global query expansion strategy using
word embeddings
Francis C. Fernández-Reyes
a
, Jorge Hermosillo-Valadez
a,*
,
Manuel Montes-y-Gómez
b
a
Centro de Investigación en Ciencias-(IICBA), Universidad Autónoma del Estado de Morelos, Av. Universidad 1001, Cuernavaca, Morelos
62209, Mexico
b
Instituto Nacional de Astrofísica, Óptica y Electrónica, Santa María Tonantzintla, Puebla 72840, Mexico
a r t i c l e i n f o
Article history:
Received 10 February 2017
Revised 27 June 2017
Accepted 9 September 2017
Keywords:
Global query expansion
Word embeddings
Information retrieval
Candidate terms pooling methods
a b s t r a c t
The effectiveness of query expansion methods depends essentially on identifying good can-
didates, or prospects, semantically related to query terms. Word embeddings have been
used recently in an attempt to address this problem. Nevertheless query disambiguation
is still necessary as the semantic relatedness of each word in the corpus is modeled, but
choosing the right terms for expansion from the standpoint of the un-modeled query se-
mantics remains an open issue. In this paper we propose a novel query expansion method
using word embeddings that models the global query semantics from the standpoint of
prospect vocabulary terms. The proposed method allows to explore query-vocabulary se-
mantic closeness in such a way that new terms, semantically related to more relevant
topics, are elicited and added in function of the query as a whole. The method includes
candidates pooling strategies that address disambiguation issues without using exogenous
resources. We tested our method with three topic sets over CLEF corpora and compared
it across different Information Retrieval models and against another expansion technique
using word embeddings as well. Our experiments indicate that our method achieves sig-
nificant results that outperform the baselines, improving both recall and precision metrics
without relevance feedback.
© 2017 Elsevier Ltd. All rights reserved.
1. Introduction
Over the years, query expansion (QE) methods have been proposed as an effective way to address the query-document
vocabulary mismatch problem in Information Retrieval (IR) tasks (Vechtomova, 2009; White & Horvitz, 2015). The aim is to
enrich the query by adding semantically related words, mainly using synonyms.
Approaches to QE can be classified into global or local methods. On the one hand, global methods expand the original
query independently of any retrieval result. Typically, WordNet is the standard exogenous tool of choice for selecting new
terms semantically associated to the original ones (Pal, Mitra, & Datta, 2014). On the other hand, local methods use relevance
feedback, whereby they perform a first retrieval whose outcome is actually used for selecting the most promising terms
*
Corresponding author.
E-mail addresses: fcaridad@uaem.mx (F.C. Fernández-Reyes), jhermosillo@uaem.mx (J. Hermosillo-Valadez), mmontesg@inaoep.mx (M. Montes-y-
Gómez).
http://dx.doi.org/10.1016/j.ipm.2017.09.001
0306-4573/© 2017 Elsevier Ltd. All rights reserved.