Investigating the Use of Summarisation for Interactive XML Retrieval Zolt ´ an Szl ´ avik Queen Mary University of London London, E1 4NS UK zolley@dcs.qmul.ac.uk Anastasios Tombros Queen Mary University of London London, E1 4NS UK tassos@dcs.qmul.ac.uk Mounia Lalmas Queen Mary University of London London, E1 4NS UK mounia@dcs.qmul.ac.uk ABSTRACT As the number of components in XML documents is much larger than that of ‘flat’ documents, we believe it is essential to provide users of XML information retrieval systems with overviews of the content of retrieved elements. In this paper, we investigate the use of summarisation in XML retrieval as a means of helping users in their searching process. 1. INTRODUCTION As the eXtensible Markup Language (XML) is becoming increasingly widespread, retrieval engines that allow search within collections of XML documents are being developed. XML documents contain not only textual information, like in ‘flat’ documents, but also information about the logical structure of the documents. The logical structure is a tree- like structure encoded by XML tags. For example, an ar- ticle can be seen as corresponding to the root of the tree, and sections, subsections and paragraphs can be arranged in branches and leaves of the tree. The logical units, called elements, provide document portions that may be better to retrieve than the whole XML document itself, i.e., some el- ements can themselves be answers to an information need while the rest of the document may contain non-, or par- tially, relevant information. Thus, in XML retrieval, doc- ument components, rather than whole documents, are re- trieved. This content-based retrieval of XML documents has received interest over the last few years, mainly through the INEX initiative [4]. As the number of XML components is typically large (much larger than that of documents), we believe it is es- sential to provide users of XML information retrieval sys- tems with overviews of the contents of the retrieved ele- ments. One approach is to use summarisation, which has been shown useful in interactive information retrieval (IIR) [6, 5, 10]. In this paper, we investigate the use of summarisation in Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’06 April 23-27, 2006, Dijon, France Copyright 2006 ACM 1-59593-108-2/06/0004 ...$5.00. XML retrieval in an interactive environment. In standard IIR, a summary is usually associated with each document returned by the retrieval system; in interactive XML re- trieval, a summary can be associated with each document component returned by the XML retrieval system. Because of the nature of XML documents, users can, in addition to accessing any retrieved element, browse within the XML document containing that element. One method to allow browsing XML documents is to display the logical structure of the document containing the retrieved elements. This has the additional benefit of providing (sometimes neces- sary) context to users when reading a component. There- fore, summaries can also be associated with the other ele- ments forming the document, in addition to the returned components themselves. The aims of our investigation are twofold: 1) regarding summarisation, we examine whether summarisation is useful when browsing within XML documents; 2) regarding struc- tural information and summarisation, we want to know what structural levels should summaries be applied to, and how closely the structural display and the use of summaries are related to each other in an interactive search process. To answer the questions above, an interactive information re- trieval system was developed and examined using human searchers. The paper is organised as follows. In Section 2, we de- scribe the experimental system that was used and, in Section 3, the experimental design. We show the results in Section 4, followed by discussion. We finish with future work. 2. EXPERIMENTAL SYSTEM In this section we describe the system that was used in our study: the user interface with XML specific features, the summarisation method and the XML search engine. 2.1 User Interface The user interface is a web based system which passes the query to the retrieval module, processes and displays the re- trieved result list and shows the result elements. The system allows users to enter a search query and start the retrieval process by clicking on the search button. The result list dis- play is similar to standard web search interfaces to minimise searchers’ frustration which may be caused by learning how to use a new system. For each result element, the following are shown: rank, retrieval score, query-biased summary, ti- tle and path of the XML document that contains the result