Selective Introduction of Aspects for Program Comprehension Andy Zaidman * , Toon Calders + , Serge Demeyer * , Jan Paredaens + + Advanced Database Research and Modelling (ADReM) * Lab On Re-Engineering (LORE) University of Antwerp Department of Mathematics and Computer Science Middelheimlaan 1, 2020 Antwerp, Belgium {Andy.Zaidman, Toon.Calders, Serge.Demeyer, Jan.Paredaens}@ua.ac.be Abstract We propose a technique that uses webmining princi- ples on event traces for uncovering important classes in a system’s architecture. These classes can form starting points for the program comprehension process. Further- more, we argue that these important classes can be used to define pointcuts for the introduction of aspects. Based on a medium-scale case study – Apache Ant – and detailed archi- tectural information from its developers, we show that the important classes found by our technique are prime candi- dates for the introduction of aspects. 1 Introduction Program comprehension is the process of understanding a system through feature and documentation analysis [11]. Gaining understanding of a program is a time-consuming task taking up to 40% of the time-budget of a maintenance operation [15]. The manner in which a programmer gets understanding of a software system varies greatly and de- pends on the individual, the magnitude of the program, the level of understanding needed, the kind of system, ... [10] Studies and experiments reveal that the success of de- composing a program into effective mental models depends on one’s general and program-specific domain knowledge. While a number of different models for the cognition pro- cess have been identified, most models fall into one of three categories: top-down comprehension, bottom-up compre- hension or a hybrid model combining the previous two [12]. The top-down model is traditionally employed by programmers with code domain familiarity. By drawing on their existing domain knowledge, programmers are able to efficiently reconcile application source code with system goals. The bottom-up model is often applied by program- mers working on unfamiliar code [4]. To comprehend the application, they build mental models by evaluating pro- gram code against their general programming knowledge [11]. For large industrial-scale systems, the program compre- hension process requires the inspection and study of a sig- nificant number of packages, classes and code. As such, a semi-automated process in which an analysis tool sup- ports the identification of key classes in a system’s architec- ture and presents these to the user suits the hybrid cognitive model that is frequently used in large-scale systems [11]. Program understanding can be attained by using one of several strategies, namely (1) static analysis, i.e., by exam- ining the source code, (2) dynamic analysis, i.e., by exam- ining the program’s behavior, or (3) a combination of both. In the context of object-oriented systems, due to polymor- phism, static analysis is often imprecise with regard to the actual behavior of the application. Dynamic analy- sis, however, allows to create an exact image of the pro- gram’s intended runtime behavior. Our actual goal is to find frequently occurring interaction patterns between classes. These interaction patterns can help us (1) build up under- standing of the software, and (2) locate candidate introduc- tion points for aspects. In this paper we propose a technique that applies datamining techniques to event traces of program runs. As such, our technique can be catalogued in the dynamic analy- sis context. The technique we use was originally developed to identify important hubs on the Internet, i.e., pages with many links to authorative pages, based on only the links be- tween web pages [9]. Hence, the Internet is viewed as a large graph. We verify that important classes in the pro-