Comprehending Scenario-level Software Evolution using Calling Context Trees Sarishty Gupta National Institute of Technology Jalandhar, India sarishty.gupta@gmail.com Paramvir Singh National Institute of Technology Jalandhar, India singhpv@nitj.ac.in AbstractSoftware evolution can be better understood if analyzed at user scenario level. Understanding the behavior of participating classes across a set of software versions could be helpful in scenario-level program comprehension. In this paper, we empirically investigate whether Calling Context Tree (CCT) based structural metrics provide new insights into comprehending the evolution of program scenarios. We analyze a set of four static, three dynamic, and four CCT metrics to comprehend the evolution of eight scenarios across four open source java applications. Correlation analysis and principal component analysis are used to analyze the relationship among the selected set of metrics. The results reveal that two of the CCT metrics have high correlation with selected static and dynamic metrics. The empirical results also suggest that CCT metrics are capable of providing additional valuable information for comprehending scenario-level software evolution. Keywords- software evolution; calling context tree; dynamic analysis; design and analysis of algorithm I. INTRODUCTION Software maintenance and evolution are vital to the success of modern-day software development. Software evolution is defined as “the dynamic behaviour of software systems as they are maintained and enhanced over their lifetimes” [4]. Software systems have to be continuously updated after their deployment in order to remain useful. Imminent changes in client requirements render software evolution inevitable. Software companies need their products to have the capacity to change as per the innovative demands of increasing software users and changing working environments. Incorporating software changes for successful maintenance and evolution requires better understanding of software functionalities, which can be achieved through effective program comprehension [22]. Software systems are becoming large and complex in order to accommodate advanced user requirements. Large software systems contain several millions of lines of code and voluminous documentation, which makes such systems quite difficult to comprehend [18]. Static program comprehension techniques face both data accuracy and performance concerns as they cannot provide the complete understanding of such large scale systems. On the other hand, although dynamic analysis techniques [3] may help in resolving the accuracy issues of static techniques, the former are still believed to cause even higher performance bottlenecks raising maintenance costs. One possible solution is to reduce the scope of comprehension to the user scenario level. Scenario level analysis defines the behaviour of a software system from a user centric perspective [10]. Hence, the software engineers view user scenarios as a powerful way to determine user needs, and to discover the behaviour of the system. The advantages of such an approach could be threefold. Firstly, it enables the application of dynamic analysis techniques leading to more accurate program comprehension results. Secondly, it reduces the scope of program comprehension to target only those code regions that are implemented by a particular user scenario, incurring lesser performance overhead. Thirdly, scenarios represent the users of the system, making the code regions covered by them the most important for program comprehension. Calling context profiling is a dynamic analysis technique that is used to capture the dynamic inter-procedural control flow of software applications [5]. CCTs have been used for program understanding (or comprehension), runtime optimization, and performance analysis in past [17, 24]. Due to their structural characteristics, CCTs are capable of providing the complete runtime information about a program’s scenario-level behaviour. Hence, they are also expected to assist in comprehending the software systems at scenario-level. This work aims to perform an empirical investigation on scenario-level software evolution using static and dynamic data collected from stable released versions of four open source java systems. The empirical investigation is divided into two parts: i) design and implementation of a scenario level metric collection process for aggregating CCT and dynamic metrics; ii) correlation analysis among a set of static, dynamic and CCT metrics. The overall contribution is a CCT based dynamic analysis approach that helps comprehend how a particular user scenario evolves across multiple software versions. II. RELATED WORK Belady and Lehman [4, 13] postulated the laws of program dynamics. They identified a set of programming process parameters, and discussed statistical models of programming process regarding system’s life cycle. Based on these studies, Lehman et al. [14] proposed the laws of software evolution. Kemerer and Slaughter [11] conducted longitudinal empirical research on software evolution to focus on how software development cost and effort change over time. They then empirically evaluated the existing Lehman Laws of software evolution.