Using Tag-Neighbors for Query Expansion in
Medical Information Retrieval
Frederico Durao, Karunakar Bayyapu, Guandong Xu, Peter Dolog, Ricardo Lage
Department of Computer Science
Aalborg University
Selma Lagerl¨ ofs Vej 300
Email: fred,kreddy,xu,dolog,ricardol@cs.aau.dk
Abstract—In the context of medical document retrieval, users
often under-specified queries lead to undesired search results that
suffer from not containing the information they seek, inadequate
domain knowledge matches and unreliable sources. To overcome
the limitations of under-specified queries, we utilize tags to
enhance information retrieval capabilities by expanding users’
original queries with context-relevant information. We compute
a set of significant tag neighbor candidates based on the neighbor
frequency and weight, and utilize the most frequent and weighted
neighbors to expand an entry query that has terms matching tags.
The proposed approach is evaluated using MedWorm medical
article collection and standard evaluation methods from the
text retrieval conference (TREC). We compared the baseline of
0.353 for Mean Average Precision (MAP), reaching a MAP 0.491
(+39%) with the query expansion. In-depth analysis shows how
this strategy is beneficial when compared with different ranks of
the retrieval results.
I. I NTRODUCTION
In the context of medical document retrieval, users often
under-specified queries lead to undesired search results that
suffer from not containing the information they seek, inade-
quate domain knowledge matches and unreliable sources. For
instance, when a user wants to search for a recent outbreak
of influenza on the web, a search with the query influenza
will return a list of documents containing the query term,
ranked by a set of criteria defined by the search engine. In
this case, at least three issues may affect the quality of the
search result. One, a query with only one or two terms may
be under-specified, that is, it may not contain enough terms
for the search engine to retrieve the desired information to
the user. Second, in the document repository of the search
engine, there might exist more than hundreds of thousands
articles matching the requested query. In such an amount of
information, it is impossible to locate the desired information
by simply browsing through all contents of returned results.
The third reason is related to domain knowledge require-
ments. Because conventional search engines focus on generic
information search, domain specific results are usually not
taken into consideration during the search. Thus, a simple
word based search does not produce relevant search results
in specific domains such as the medical domain [1]. As a
consequence of these issues related to query-based searches,
only one fourth to one half of the relevant articles on a given
topic are retrieved in searches performed in specific domains
[2]. In other words, the sparse and incomplete query terms may
result in information overload increasing the noise present in
search results. Hence, the importance of refining a query is
increased in such scenarios.
To overcome the limitations of under-specified queries,
we utilize tag neighbors to enhance information retrieval
capabilities by expanding the user’s original query. Tags are
free style terms to make annotations indicating the user’s
own perceptions or conceptual judgments about the tagged
resources. We focus on medical document collections, e.g.
PubMed
1
and MedWorm
2
, because in searching these collec-
tions it is often desirable to retrieve only those documents
pertaining to a specific medical area. To this end, tags given
by the users to the documents in the collection are typically
related to the domain(s) each user is interested in. That is,
users are able to choose their own free style terms (i.e. tags)
which are associated to the domain(s) of their interest.
The purpose of query expansion is to fill the gap between the
users entered queries and extracting the relevant documents.
In a nutshell, we compute a set of significant tag neighbor
candidates based on the tag neighbor frequency and weight and
utilize the most frequent and weighted tag neighbors to expand
an entry query that has terms matching tags. For instance, if a
user submits a query influenza, the query will be automatically
mapped to the higher frequency tag neighbor term contagious
by our method. Thus, the search will be refined by retrieving
documents having the words influenza and contagious in their
contents. Furthermore, neighbor terms also searchable. Take
the previous query, for example, documents indexed with
medical terms that include the word influenza (e.g. influenza
contagious viral) will also be returned depending on the
neighbor frequency and weight.
In this paper, the expansion terms we used are selected
from a large amount of tags provided by the users. Then we
propose to use the tag neighbors method for a high frequency
term selection. Based on this method we tried to choose good
expansion terms from the candidate neighbors, according to
their potential impact on retrieval effectiveness. We implement
our method in a search system with contents extracted and
indexed from the MedWorm medical article database. We
1
www.ncbi.nlm.nih.gov/pubmed
2
www.medworm.com
978-1-4244-9224-4/11/$26.00 ©2011 IEEE