Generating resource descriptions from metadata to support relevance assessments in retrieval Alison Cawsey, Diana Bental & Bruce Eddy Department of Computing and Electrical Engineering Heriot-Watt University Riccarton, Edinburgh, Scotland EH14 7AS {alison,diana,ceebde1}@cee.hw.ac.uk Patrick McAndrew Institute for Educational Technology The Open University Walton Hall, Milton Keynes, England, MK7 6AA P.McAndrew@open.ac.uk Abstract We present methods for presenting descriptions of online resources that will help the user assess their likely relevance without having to download them. These descriptions are based on metadata describing a resource. Two approaches are explored. The first uses current XML-based standards and tools (XSLT and RDF) to offer tailored tabular presentations from selected metadata. The second uses natural language generation techniques to create concise textual descriptions. Both approaches tailor descriptions according to user interests using a simple user profile based on stereotypes. Introduction Searching for relevant online documents, whether multimedia or text, is an interactive process involving relevance judgements by the user as well as the search engine. A search engine may retrieve a list of documents which are rated as relevant to the query, but it is then over to the user to examine this imperfect list to find those that are most likely to be genuinely useful given their information need. These may be then downloaded, and a further judgement made as to whether the query needs revising in the light of results. One obstacle in this process is that the information provided by the search engine about each resource is often not sufficient to allow the user to assess its possible relevance prior to download ; often just the title, and a few lines of text from the resource are given, or a thumbnail image. A consequence of this is that user time and bandwidth is wasted in downloading resources which turn out to be irrelevant to the user’s information need. The problem is especially acute where the resources consist primarily of multimedia components, with consequent poor descriptions and high download times. There is already some work addressing this problem. Summarisation techniques may be used for text resources, with query-directed summaries describing document content in a way that depends on the user’s query (e.g., Sanderson, 1998). While promising, the summaries produced are limited to what can be extracted from the text, and thus ignore information about the resource which may be external to it. Generating genuine summaries (as contrasted with concatenated extracted fragments of the document) also requires domain-specific knowledge, in order to robustly extract the key information from a document. And as the summarisation methods work by extracting text fragments, they are not useful for multimedia resources where the text represents but a small part of the information content of the resource. Another approach that is currently being pursued to support better search and retrieval is the use of rich metadata. Metadata is data about the resource, such as the topic, author, and date of last