294 Journal of Digital Information Management Volume 6 Number 4 August 2008 Map-based Interfaces for Information Management in Large Text Collections Rudolf Mayer, Angela Roiger, Andreas Rauber Institute of Software Technology and Interactive Systems Vienna University of Technology, Vienna, Austria mayer@ifs.tuwien.ac.at, angela@roiger.at, rauber@ifs.tuwien.ac.at ABSTRACT:The Self-Organising Map (SOM) has been proposed as an alternative interface for exploring Digital Libraries or other big document collections, in addition to conventional search and browsing. With advanced visualisations assisting the user in understanding the contents of the map and its structure, as well as advanced interaction modes as zooming, panning and area selection, the SOM becomes a feasible alternative to classical search interfaces. Several applications show the SOM’s utility for this task. However, there are still shortcomings in helping the user understanding the ma, which is essential to fully exploit the SOM’s potential as an Information Management tool. There are insufficient methods developed for describing the map to support the user in the analysis of the map contents. In this paper, we give an overview of existing techniques and applications of SOMs in Digital Libraries, and present recent work in assisting the user in exploring the map by automatically describing maps using advanced labelling and summarisation of map regions, focusing on text collections. Therewith, the SOM becomes an attractive tool for Information Management in large corpora. Categories and Subject Descriptors H.3.7 [Digital Libraries]; H.3.5 [Online Information Services]: I.7 [Document and Text Processing] General Terms Self-organizing maps, Neural network, Information Management Keywords: Map interfaces, Text collection Received 10 Sep. 2007; Reviewed and accepted 27 Jan. 2008 1. Introduction The Self-Organising Map (SOM) [1] is a popular unsupervised neural network model that provides a mapping from a high-dimensional input space (for example text documents described in a vector space model) to a low, often two-dimensional, output space. The mapping of the SOM is topology preserving – elements close in the input space will in general also be close in the output space. Due to its interesting properties, the SOM has been used in many data mining settings, for example in several applications to automatically organise document collections in a Digital Library by their content. Examples for such collections are in the domain of text documents, as in the SOMLib Digital Library system [2] or in a map of news texts [3], music documents as in the SOMeJB system [4], or images as in the PicSOM system [5]. As a recent example, also the Digital Library Management System (DLMS) developed by the DELOS Network of Excellence [6] incorporates the SOM as an interface to a Digital Library’s content, as it offers the user support in analysing and exploring the content. With advanced visualisations and interaction possibilities, the user can exploit the full potential of the SOM. However, we still lack techniques to adequately help the user in analysing the contents of the map. For large maps, containing several tens of thousands documents describing various different topics, it becomes increasingly difficult to quickly understand the map. Figure 1. Mapping of the SOM: Spatially close elements in the input space V are spatially close in the output space A as well In this paper, we give an overview of existing applications of the Self-Organising Map in Digital Libraries and techniquesto explore and interact with the map. Furthermore, we present recent work in making the SOM more usable for Information Management by automatically describing regions in the map through adding semantic labels to the SOM, using clustering methods to identify topical areas and selecting representative labels for those regions. Moreover, we present work on automatically summarising the content of those regions on the SOM. The remainder of this paper is organised as follows: Section II gives a brief overview of the Self-Organising Map and its application in the context of Digital Libraries. Section III describes our work in labelling and summarising regions, while Section IV presents the experiments conducted. Section V presents conclusions and future work. 2. Self-Organising Map The Self-Organising Map (SOM) is a neural network model frequently employed for various data mining purposes. It provides a mapping from a high-dimensional input space to a lower-dimensional output space. Although many different architectures exist, the output space is in many applications organised as a two-dimensional rectangular grid of units, a Journal of Digital Information Management