Vol.:(0123456789) 1 3
Evolving Systems
https://doi.org/10.1007/s12530-019-09292-7
ORIGINAL PAPER
Employing query disambiguation using clustering techniques
Andreas Kanavos
1
· Panagiota Kotoula
1
· Christos Makris
1
· Lazaros Iliadis
2
Received: 27 November 2018 / Accepted: 3 July 2019
© Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract
Due to the boundless expansion of the Web in the last decade, the research community has paid signifcant attention to the
problem of efective searching in the vast information available. In this paper, we introduce a novel framework for improving
information retrieval results. Initially, relevant documents are organized in clusters utilizing several metrics combined with
language modelling tools. In following, a produced ranked list of the documents is returned to the user for a specifc query.
This is implemented as the scores between the clusters and the query representations are extracted; next in line, the internal
rankings of the documents, per cluster, using these scores as weighting factor, are combined. Our proposed methodology is
based on the exploitation of the inter-documents similarities (lexical and/or semantics) after a sophisticated pre-processing
step. Our experimental evaluation demonstrates that the proposed algorithm can efciently improve the quality of the
retrieved results.
Keywords Query disambiguation · Information retrieval · Query reformulation · Clustering · Containment · Semantics
1 Introduction
Search engines constitute tools of inestimable value in order
to retrieve information from the Web. However, when mixed
together in the answer list, they are not efcient in present-
ing ambiguous queries that usually result in web page ref-
erences mapped to diferent meanings. More specifcally,
extracting knowledge and grouping the results returned by
a search engine into groups or a hierarchy of labelled clus-
ters, is a very important task that modern search engines
have recently started taking into consideration.
1
With the
use of category clustered results, the user may focus on a
general topic by entering a generic query and then selecting
the results that better match his interest.
As one of the most popular research issues, one can con-
sider the subject of improving the quality of ranking in Infor-
mation Retrieval results. To this extent, information need is
expressed through the form of queries submitted against a
search engine or platform with the purpose of receiving any
available information related to the query (Baeza-Yates and
Ribeiro-Neto 2011; Manning et al. 2008). The problem, as
well as the challenge in this process, is the potential and the
capability of the search machine to respond and in following
to deliver the fttest set of information for the specifc query,
if this information actually exists.
On the other hand, users that post their queries do not
have the corresponding experience and thus cannot be con-
sidered as appropriate enough of the best format to provide
their input query. One potential reason can be either because
they cannot express their intention clearly or because they
do not leverage the full potential of the search platform. The
search engine’s greatest challenge is then, to understand
users’ intention through this given input, or in other words,
the query itself, that is to disambiguate the terms that syn-
thesize the query and attempt to satisfy the query request.
A preliminary version of this paper was presented in 14th
International Conference on Artifcial Intelligence Applications
and Innovations, AIAI 2018, Rhodes, Greece, May 25–27, 2018.
* Andreas Kanavos
kanavos@ceid.upatras.gr
Panagiota Kotoula
kotoula@ceid.upatras.gr
Christos Makris
makri@ceid.upatras.gr
Lazaros Iliadis
liliadis@civil.duth.gr
1
Computer Engineering and Informatics Department,
University of Patras, 26504 Patras, Greece
2
Department of Civil Engineering, Democritus University
of Thrace, 67100 Xanthi, Greece
1
Google: https://www.google.com/search/about/.