Interactive Exploration of Semantic Clusters In proceedings of the International Workshop on Visualizing Software for Understanding and Analysis (VISSOFT 2005) Mircea Lungu 1 , Adrian Kuhn 2 , Tudor Gˆ ırba 3 and Michele Lanza 4 1,4 Faculty of Informatics University of Lugano, Switzerland 2,3 Software Composition Group University of Berne, Switzerland Abstract Using visualization and exploration tools can be of great use for the understanding of a software system when only its source code is available. However, understanding a large software system by visualizing only its lower level artifacts (e.g., classes, methods) and the relations between them does not scale for industrial-size systems. To address the scal- ability issue, higher level hierarchical abstractions (e.g., package structure, clustered decompositions of the system) should be used together with relations between them that are usually aggregated from the lower level relations. In this paper, we present the concepts behind Softwarenaut, a tool aimed at exploring any kind of hierarchical decomposi- tions of a system, and then we look at a specific exploration of a system. In the experiment, the hierarchical decompo- sition of the system is the result of applying a semantical clustering to group classes that use similar terms. Keywords: software exploration, visualization, cluster- ing, LSI 1 Introduction When only the source code is available, recovering the architecture of a large software system is a difficult task be- cause of its sheer size and complexity. Considering that even in the presence of a well-thought initial design the evolutionary processes, such as bug fixing and feature addi- tions, lead to a decay of both the architecture and the source code itself [3], the difficulty of understanding a legacy sys- tem only by analyzing the code becomes obvious. One approach to software reverse engineering is the use of visualization techniques to represent the software entities and their relationships [14, 13, 7]. However, understand- ing a large software system by visualizing only its lower level artefacts (e.g., classes, files) and the relations between them does not scale for industrial sized systems. To ad- dress the scalability issue we use higher level hierarchical abstractions (e.g., package structure) and relations between them. When there are no explicit relationships between ab- stractions we aggregate them from the lower level relations. Furthermore, we present the reverse engineer with several integrated complementary views of the system: a view of the current high-level focus, a map for showing the location of the current focus and a detailed view of a selection. Another widely used approach in reverse engineering is clustering [5, 10]. The clustering techniques provide an automatic way of separating a complex system in simpler components. For example, in the case of hierarchical clus- tering the system is decomposed into hierarchical decom- positions that have to be manually inspected for the right abstraction level to be detected. In this article we present an approach to interactively ex- plore the hierarchical clusters given by the classes that use similar terms. The approach is based on Softwarenaut, an environment for the interactive, visual exploration of any hi- erarchical decomposition of a software system. In the par- ticular case of the article, the hierarchical decomposition that we will use will be provided by Hapax, our semantic analysis framework. For the semantic clustering Hapax uses an information retrieval technique called Latent Semantic Indexing (LSI) [6]. The user can interact with, and navigate the visualizations of the semantical clusters, aided by com- plementary lower level information about the properties and interconnections between the components of the clusters. Structure of the paper. We start by presenting the model for the hierarchical structures that can be explored with the techniques presented in this article. In Section 3 we describe the visualization and exploration techniques that 1