RETRIEVING DOCUMENTS BY CONSTRAINED SPREADING ACTIVATION ON AUTOMATICALLY CONSTRUCTED HYPERTEXTS Fabio Crestani Department of Computing Science University of Glasgow Glasgow G12 8QQ, Scotland Tel. + 44 - 141 - 330 6292 fax. + 44 - 141 - 330 4913 email: fabio@dcs.gla.ac.uk Abstract We report on the design of a system that enables the user to perform retrieval of items of interest both by browsing an automatically constructed very large hypertext and by spreading activation on the hypertext network itself. It is this second option that is particularly interesting since it enables the user to retrieve items that have not been visited, but that are similar to items flagged as relevant during the browsing. The spreading of activation from relevant items to similar items is achieved using a form of “constrained spreading activation”. This is a kind of spreading activation controlled by heuristic rules that limit and direct the activation towards the most promising links and node of the hypertext. 1 Introduction In associative retrieval associations among information items are often represented as a network, where information items are represented by nodes and associations by links connecting nodes. The heuristic rule, consisting in retrieving items associated to those assessed as relevant, is often implemented by means of a technique called spreading activation. The purpose of this paper is to describe the design of a system that enables associative retrieval by “constrained” spreading activation on an automatically constructed very large hypertext. At present there is a decrease of interest on the use of spreading activation on large networks (like for example semantic networks). This is mainly due to the fact that the construction of a network of associations among information items is very time consuming process when the size of the document collection is very large. Most of the original work in associative retrieval was performed with small document collections, and often the associations among the information items were set up manually or semi-automatically [9]. This, of course, becomes impossible when the document collection is very large. However, nowadays more and more computing power is becoming available and its cost is rapidly decreasing, making it possible to construct associative networks from large document collections in a automatic way. In [2, 1] we presented a methodology and tool for the automatic construction of hypertexts to be used for information retrieval purposes. With such a tool, it is possible to build up a large hypertext from a flat collection of textual documents in a completely automatic way. 2 Information Retrieval and Hypertexts Information Retrieval (for a good overview see [5]) is a science that aims to store and allow fast access to a large amount of unstructured information. This information can be of any kind: textual, visual, or auditory. Most actual IR systems store and enable the retrieval of only textual information called documents. Anyway, the task is not simple, Previously at Dipartimento di Elettronica e Informatica, Universit´ a di Padova, Italy