N. Fuhr et al. (Eds.): INEX 2004, LNCS 3493, pp. 88 – 99, 2005.
© Springer-Verlag Berlin Heidelberg 2005
Logic-Based XML Information Retrieval for
Determining the Best Element to Retrieve
Maryam Karimzadegan
1
, Jafar Habibi
1
, and Farhad Oroumchian
2
1
Department of Computer Engineering, Sharif University of Technology,
Azadi Street, Tehran, Iran
{karimzadegan, habibi}@ce.sharif.edu
2
University of Wollongong in Dubai, Dubai
FarhadOroumchian@uowdubai.ac.ae
Abstract. This paper presents UOWD-Sharif team’s approach for XML
information retrieval. This approach is an extension of PLIR which is an
experimental knowledge-based information retrieval system. This system like
PLIR utilizes plausible inferences to first infer the relevance of sentences in
XML documents and then propagates the relevance to the other textual units in
the document tree. Two approaches have been used for propagation of
confidence. The first approach labeled “propagate-DS” first propagates the
confidence from sentences to upper elements and then combines these
evidences by applying Dempster-Shafer theory of evidence to estimate the
confidence in that element. The second approach “DS-propagate” first applies
the Dempster-Shafer theory of evidence to combine the evidences and then
propagates the combined confidence to the parent element. The second
approach performs relatively better than the first approach.
1 Introduction
The widespread use of Extensible Markup Language (XML) has brought up a number
of challenges for information retrieval systems. These systems exploit the logical
structure of documents instead of a whole document. In traditional information
retrieval (IR), a document is considered as an atomic unit and is returned to a user as a
query result. XML assumes a tree-like structure for the documents for example
sentences, paragraphs, sections etc. Therefore XML retrieval not only is concerned
with finding relevant documents but with finding the most appropriate unit in the
document that satisfies a user’s information need. A meaningful retrievable unit
shouldn’t be too small because in this case it might not cover all the aspects of users
need (exhaustivity). It shouldn’t be too large either because in this case there could be
a lot of non-relevant information that are of no particular interest to a user’s current
information need (specificity). Therefore, XML retrieval is an approach for providing
more focused information than traditionally offered by search engines when we know
the structure of the documents[4].
We have used the INEX collection for evaluation of our XML retrieval system.
The INEX document collection is made up of the full-texts, marked up in XML that