Reusing Program Investigation Knowledge for Code Understanding Martin P. Robillard and Putra Manggala School of Computer Science McGill University Montr´ eal, QC, Canada {martin,pmangg}@cs.mcgill.ca Abstract Software maintenance tasks typically involve an impor- tant amount of program investigation effort on the part of software developers. To what extent can we benefit from prior program investigation activities to decrease this ef- fort? To investigate this question, we studied the revision history of two systems to determine how knowledge derived from prior investigation activities could have been reused to support other change tasks. Our initial investigation used a tool, ConcernDetector, that can recommend sets of program elements associated with a high-level concern when ele- ments in the set overlap with elements currently being mod- ified. We discovered that simple overlap-based techniques for retrieving prior investigation knowledge have important limitations, and that effective reuse of prior program inves- tigation knowledge requires analyses that can partially in- fer the nature and intent of a task. 1. Introduction During the maintenance of a mature software system, change tasks often involves parts of the system that have been modified in the past [17]. In extreme cases, a small number of complex, unstable, or poorly-implemented code locations are modified on a regular basis to address unstable requirements or to fix bugs. Performing a change task generally requires a developer to investigate the source code to identify the relevant seg- ments. At the end of the task, this developer is likely to have located the corresponding code and to have acquired some understanding of it. Knowledge about a change task can be expressed in terms of the different concerns associ- ated with the task [13]. Simply put, concerns are high-level concepts relevant to developers, such as individual features, requirements, or design decisions. Unfortunately, knowl- edge about the implementation of concerns is all too often forgotten as the developer moves on to a different task. It it not unusual that addressing a previously-modified concern after only a few months requires a re-investigation the code. Our goal is to mitigate the loss of tacit knowledge devel- opers have about the implementation of concerns through the use of concern documentation, i.e., documentation link- ing high-level concerns with the corresponding source code. Prior studies have provided evidence that documenting con- cerns by identifying the source elements involved in their implementation can provide immediate benefits to develop- ers involved in a non-trivial change task [9, 13]. We hence- forth refer to this activity as concern mapping. To further maximize the benefits of concern mapping, we were inter- ested in studying to what extent previously-mapped con- cerns could be used to assist program investigation activ- ities in future tasks. In particular, we were interested in determining a) how to produce concern mappings that are likely to be useful in the future, and b) the ideal strategies for retrieving concern mappings relevant to the current task. As our initial approach, we investigated the retrieval of concerns based on a simple overlap metric. We designed a tool, ConcernDetector, that can recommend existing con- cern mappings to a developer when source code elements (fields and methods) modified by the developer overlap with the elements specified in previously-produced mappings. Using the change history of two open-source systems, we simulated a change stream to study how ConcernDetector would have behaved in realistic contexts. We discovered that simple overlap-based techniques for identifying rele- vant prior investigation knowledge have important limita- tions, and that effective reuse of prior program investiga- tion knowledge requires analyses that can partially infer the nature and intent of a change task. The contributions of this paper include ConcernDetector, our publicly-released concern recommendation tool, and the results of two empirical studies that provide a number of insights into the challenges associated with the retrieval of previous program investigation knowledge. The rest of this paper is organized as follows. In Sec- tion 2, we describe the tools we developed to provide a prac- In Proceedings of the 16th IEEE International Conference on Program Comprehension, 2008 (C) IEEE