Query expansion using external resources for improving
information retrieval in the biomedical domain
Khadim Dramé, Fleur Mougin, Gayo Diallo
ERIAS, INSERM U897, ISPED, University of Bordeaux
146 rue Leo Saignat 33076, Bordeaux
firstname.lastname@u-bordeaux.fr
Abstract. This paper presents the first participation of the ERIAS team in task
3 of the ShARe/CLEF eHealth Evaluation Lab 2014. The goal of this task is to
evaluate the effectiveness of Information Retrieval systems to support patients
in accessing easily relevant information. We propose a method which exploits
external resources for improving information retrieval in the biomedical do-
main. The proposed approach is based on the well-known Vector Space Model
and it uses two extensions of this model to enhance its performance. Specifical-
ly, the MeSH thesaurus is used for query expansion with different configura-
tions. Experiments on a large collection of documents have shown that the use
of these external resources can improve performance in medical in-formation
retrieval.
Keywords: information retrieval, Lucene, Vector Space Model, n-gram extrac-
tion, query expansion, MeSH thesaurus.
1 Introduction
The role of an Information Retrieval (IR) system is to support users to access rele-
vant information corresponding to their needs. In the medical domain, accessing use-
ful information becomes increasingly important with the growing amount of available
information. To tackle this challenging issue, different approaches have been pro-
posed raising the challenge of assessing their performance. The ShARe/CLEF (Cross-
Language Evaluation Forum) eHealth Lab [1] is an evaluation campaign in the bio-
medical domain which aims at easing patients (and their relatives) to understand their
health-related information. Especially, the goal of the third task is to develop methods
which facilitate the access to valuable information to patients regarding their
health[2]. Indeed, the amount of biomedical information is growing rapidly with an
abundant production of digital collection of documents. Accessing to useful infor-
mation among this large amount of available data becomes a real challenge. To do so,
controlled vocabularies, such as the Medical Subject Heading (MeSH) thesaurus, are
widely used to improve the medical information retrieval (IR). In [3], the authors
proposed the use of the MeSH thesaurus for expanding user queries. Terms associated
189