J Intell Inf Syst (2012) 39:209–247
DOI 10.1007/s10844-011-0189-9
Towards an effective automatic query expansion
process using an association rule mining approach
Chiraz Latiri · Hatem Haddad · Tarek Hamrouni
Received: 4 May 2011 / Revised: 20 November 2011 / Accepted: 24 November 2011 /
Published online: 20 December 2011
© Springer Science+Business Media, LLC 2011
Abstract The steady growth in the size of textual document collections is a key
progress-driver for modern information retrieval techniques whose effectiveness and
efficiency are constantly challenged. Given a user query, the number of retrieved
documents can be overwhelmingly large, hampering their efficient exploitation by
the user. In addition, retaining only relevant documents in a query answer is of
paramount importance for an effective meeting of the user needs. In this situation,
the query expansion technique offers an interesting solution for obtaining a complete
answer while preserving the quality of retained documents. This mainly relies on an
accurate choice of the added terms to an initial query. Interestingly enough, query
expansion takes advantage of large text volumes by extracting statistical information
about index terms co-occurrences and using it to make user queries better fit the real
information needs. In this respect, a promising track consists in the application of
data mining methods to the extraction of dependencies between terms. In this paper,
we present a novel approach for mining knowledge supporting query expansion that
is based on association rules. The key feature of our approach is a better trade-
off between the size of the mining result and the conveyed knowledge. Thus, our
association rules mining method implements results from Galois connection theory
and compact representations of rules sets in order to reduce the huge number of
potentially useful associations. An experimental study has examined the application
of our approach to some real collections, whereby automatic query expansion has
been performed. The results of the study show a significant improvement in the
C. Latiri · H. Haddad · T. Hamrouni (B )
URPAH Team, Computer Sciences Department, Faculty of Sciences of Tunis,
El Manar University, Tunis, Tunisia
e-mail: tarek.hamrouni@fst.rnu.tn
C. Latiri
e-mail: chiraz.latiri@gnet.tn
H. Haddad
e-mail: haddad.hatem@gmail.com