T. Yakhno (Ed.): ADVIS 2002, LNCS 2457, pp. 114–122, 2002.
© Springer-Verlag Berlin Heidelberg 2002
Cross-Language Information Retrieval Using Multiple
Resources and Combinations for Query Expansion
Fatiha Sadat
1
, Masatoshi Yoshikawa
1, 2
, and Shunsuke Uemura
1
1
Graduate School of Information Science, Nara Institute of Science and Technology
(NAIST). 8916-5 Takayama, Ikoma, Nara 630-0101. Japan
2
Information Technology Center, Nagoya University
{fatia-s, yosikawa, uemura}@is.aist-nara.ac.jp
Abstract. As Internet resources become accessible to more and more countries,
there is a need to develop efficient methods for information retrieval across lan-
guages. In the present paper, we focus on query expansion techniques to im-
prove the effectiveness of an information retrieval. A combination to a diction-
ary-based translation and statistical-based disambiguation is indispensable to
overcome translation’s ambiguity. We propose a model using multiple sources
for query reformulation and expansion to select expansion terms and retrieve in-
formation needed by a user. Relevance feedback, thesaurus-based expansion, as
well as a new feedback strategy, based on the extraction of domain keywords to
expand user’s query, are introduced and evaluated. We tested the effectiveness
of the proposed combined method, by an application to a French-English Infor-
mation Retrieval. Experiments using CLEF data collection proved a great effec-
tiveness of the proposed combined query expansion techniques.
1 Introduction
With the explosive growth of international users, distributed information and the
availability of linguistic resources for research, accessible through the World Wide
Web, an information retrieval became such a crucial task to fulfill user’s needs, find,
retrieve and understand relevant information, in whatever language and form.
Cross-Language Information Retrieval (CLIR), consists of providing a query in one
language and searching document collections in one or multiple languages. Therefore,
a translation form is required. In this paper, we focus on query translation using bilin-
gual Machine Readable Dictionaries (MRDs) with a combination to statistics-based
disambiguation to avoid polysemy after translation. Automatic query expansion,
which has been known to be among the most important methods in overcoming the
word mismatch problem in information retrieval, is considered as a major interest. The
proposed study is general across languages in information retrieval however; we have
conducted experiments and evaluations with an application to French and English
languages.