N. Fuhr et al. (Eds.): INEX 2004, LNCS 3493, pp. 88 99, 2005. © Springer-Verlag Berlin Heidelberg 2005 Logic-Based XML Information Retrieval for Determining the Best Element to Retrieve Maryam Karimzadegan 1 , Jafar Habibi 1 , and Farhad Oroumchian 2 1 Department of Computer Engineering, Sharif University of Technology, Azadi Street, Tehran, Iran {karimzadegan, habibi}@ce.sharif.edu 2 University of Wollongong in Dubai, Dubai FarhadOroumchian@uowdubai.ac.ae Abstract. This paper presents UOWD-Sharif team’s approach for XML information retrieval. This approach is an extension of PLIR which is an experimental knowledge-based information retrieval system. This system like PLIR utilizes plausible inferences to first infer the relevance of sentences in XML documents and then propagates the relevance to the other textual units in the document tree. Two approaches have been used for propagation of confidence. The first approach labeled “propagate-DS” first propagates the confidence from sentences to upper elements and then combines these evidences by applying Dempster-Shafer theory of evidence to estimate the confidence in that element. The second approach “DS-propagate” first applies the Dempster-Shafer theory of evidence to combine the evidences and then propagates the combined confidence to the parent element. The second approach performs relatively better than the first approach. 1 Introduction The widespread use of Extensible Markup Language (XML) has brought up a number of challenges for information retrieval systems. These systems exploit the logical structure of documents instead of a whole document. In traditional information retrieval (IR), a document is considered as an atomic unit and is returned to a user as a query result. XML assumes a tree-like structure for the documents for example sentences, paragraphs, sections etc. Therefore XML retrieval not only is concerned with finding relevant documents but with finding the most appropriate unit in the document that satisfies a user’s information need. A meaningful retrievable unit shouldn’t be too small because in this case it might not cover all the aspects of users need (exhaustivity). It shouldn’t be too large either because in this case there could be a lot of non-relevant information that are of no particular interest to a user’s current information need (specificity). Therefore, XML retrieval is an approach for providing more focused information than traditionally offered by search engines when we know the structure of the documents[4]. We have used the INEX collection for evaluation of our XML retrieval system. The INEX document collection is made up of the full-texts, marked up in XML that