Probabilistic Slicing for Predictive Impact Analysis Raul Santelices and Mary Jean Harrold College of Computing, Georgia Institute of Technology E-mail: {raul|harrold}@cc.gatech.edu ABSTRACT Program slicing is a technique that determines which statements in a program affect or are affected by another statement in that pro- gram. Static forward slicing, in particular, can be used for impact analysis by identifying all potential effects of changes in software. This information helps developers design and test their changes. Unfortunately, static slicing is too imprecise—it often produces large sets of potentially affected statements, limiting its usefulness. To reduce the resulting set of statements, other forms of slicing have been proposed, such as dynamic slicing and thin slicing, but they can miss relevant statements. In this paper, we present a new technique, called Probabilistic Slicing (p-slicing), that augments a static forward slice with a relevance score for each statement by exploiting the observation that not all statements have the same probability of being affected by a change. P-slicing can be used, for example, to focus the attention of developers on the “most im- pacted” parts of the program first. It can also help testers, for exam- ple, by estimating the difficulty of “killing” a particular mutant in mutation testing and prioritizing test cases. We also present an em- pirical study that shows the effectiveness of p-slicing for predictive impact analysis and we discuss potential benefits for other tasks. 1. INTRODUCTION Software is constantly modified during its life cycle, resulting in many challenges for developers because changes might not behave as expected or may introduce erroneous side effects. Whenever software must be modified to achieve some goal (e.g., fix errors, add new functionality, or improve the quality of the code), devel- opers must assess the parts of the software potentially impacted by planned changes. This change-planning task is particularly deli- cate in large and complex software where developers do not fully understand the consequences that a change might have. Also, after software is modified, developers must identify the parts affected by changes so that they can retest those parts. To help accomplish these tasks, developers use change impact analysis (or, simply, impact analysis), which determines, for some level of granularity (e.g., statements, modules, features), which entities in the software can be affected by changes. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright Georgia Institute of Technology. November, 2010.. Program slicing is a well-known technique that can be used for various software-quality tasks, including impact analysis. Program slicing was originally developed as a backward analysis of program code to aid in its comprehension and debugging [22]. Its coun- terpart, forward slicing, is used for change-related tasks, such as impact analysis [4], regression testing [15], and program integra- tion [3, 18]. These techniques, collectively called static slicing, use the control and data dependencies in the code to identify the set of all statements that can affect or be affected by another statement. Unfortunately, static slicing is often too imprecise for practical use because it tends to produce very large sets of statements. To reduce the resulting set of statements, other forms of slicing have been de- veloped, such as dynamic slicing [10] and thin slicing [20], but they are incomplete and can miss important parts of the code. To address the limitations of previous slicing techniques for use in impact analysis, we developed, and present in this paper, a new technique, called Probabilistic Slicing (p-slicing), that augments a forward slice with a relevance score for each statement. P-slicing exploits the observation that not all statements have the same prob- ability of being affected by a change. Intuitively, a forward slice only indicates ”whether” a statement s is affected by a change C no matter how small that influence might be, whereas the relevance score indicates “how much” s is affected by C. The relevance score is computed by statically analyzing the probability that (1) an ex- ecution reaches s after executing C, (2) a sequence of data and control dependencies is exercised between C and s, and (3) a mod- ification of the program state (an infection [21]) propagates from C to s through that sequence of dependencies. If these three events occur, then either the execution history of s (i.e., the number of occurrences of s during an execution) or the values computed or branching decisions taken at s are modified due to C, and, thus, s is impacted by C. Our new technique exploits two important insights not fully con- sidered in existing program-slicing research: 1. Our technique recognizes that some data dependencies are less likely to be covered than other data dependencies because the conditions to reach a use from a definition can be more com- plex and difficult to satisfy in some cases. To incorporate this factor, p-slicing uses an interprocedural reachability analysis to estimate the probability that a use is reached by a definition. 2. Our technique not only recognizes that data dependencies are more likely to propagate infections than control dependencies [8, 12, 20], but, unlike techniques that discard some or all control dependencies [20], our technique includes control dependencies from the beginning and gives them a lesser propagation proba- bility than for data dependencies. To perform forward slicing and compute the probabilities (rele- vance scores) for each statement, p-slicing annotates the interproce- 1