Effective Identification of Failure-Inducing Changes: A Hybrid Approach * Sai Zhang, Yu Lin, Zhongxian Gu, Jianjun Zhao School of Software Shanghai Jiao Tong University 800 Dongchuan Road, Shanghai 200240, China {saizhang, linyu1986, ausgoo, zhao-jj}@sjtu.edu.cn ABSTRACT When regression tests fail unexpectedly after a long session of edit- ing, it may be tedious for programmers to find out the failure- inducing changes by manually inspecting all code edits. To elimi- nate the expensive effort spent on debugging, we present a hybrid approach, which combines both static and dynamic analysis tech- niques, to automatically identify the faulty changes. Our approach first uses static change impact analysis to isolate a subset of respon- sible changes for a failed test, then utilizes the dynamic test execu- tion information to rank these changes according to our proposed heuristic (indicating the likelihood that they may have contributed to the failure), and finally employs an improved Three-Phase delta debugging algorithm, working from the coarse method level to the fine statement level, to find a minimal set of faulty statements. We implemented the proposed approach for both Java and As- pectJ programs in our AutoFlow prototype. In our evaluation with two third-party applications, we demonstrate that this hybrid ap- proach can be very effective: at least for the subjective programs we investigated, it takes significantly (almost 4X ) fewer tests than the original delta debugging algorithm to locate the faulty code. 1. INTRODUCTION Programmers often spend a significant amount of time debug- ging programs in order to reduce the number of bugs in software releases. In modern software development, coding and testing are interleaved activities to assure code quality. Typically, when regres- sion tests fail or produce any unexpected result after a long session of editing, it may indicate potential defects in the updated software. When attempting to fix an exhibited bug, programmers usually: (1) identify statements involved in failed tests, (2) narrow the search by selecting suspicious changes that might contain faults, (3) hy- pothesize about the suspicious faults, and (4) restore the program variables to a specific state [8]. However, the search of suspicious changes is an arduous, highly involved, and manual process. This * This work was supported in part by National High Technology Development Pro- gram of China (Grant No. 2006AA01Z158), National Natural Science Foundation of China (NSFC) (Grant No. 60673120), and Shanghai Pujiang Program (Grant No. 07pj14058). All work reported in this paper was done while all authors were at Shang- hai Jiao Tong University. Currently Zhongxian Gu is at University of California, Davis. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PASTE’08, November 9-10, Atlanta, Georgia USA. Copyright 2008 ACM 978-1-60558-382-2/08/11 ...$5.00. phase can be quite time-consuming and expensive. The high cost of locating fault causes in software evolution has motivated the development of automatic debugging techniques, such as [7, 12, 14, 16, 17, 19, 21, 33]. Of particular interest for our work is the delta debugging algorithm provided by Zeller [27]. In the work [27] by Zeller et al. on delta debugging, the reason for a pro- gram failure is identified as a set of differences (the deltas) between program versions that distinguish a passed program execution from a failed one. A set of failure-inducing differences is determined by repeatedly applying different subsets of the changes (the con- figurations) to the original program and observing the outcome of executing the intermediate programs. By correlating the outcome of each execution with the set of changes applied, one can narrow the set of failure-inducing changes. It was shown [27] that delta debugging is effective in finding faulty code even in large software applications like GDB [3]. However, the original delta debugging algorithm is a general and language-independent technique appli- cable to minimizing the suspicious changes for better debugging. When employing it to a specific programming language like Java, this approach can be improved. For instance, the original delta de- bugging algorithm searches the entire set of changes to identify the failure-inducing ones. However, for a specific failing test, a portion of uncorrelated changes can be ignored and we only need to focus on the related changes. Also, delta debugging selects and applies the changes in an arbitrary order. But ideally, the changes which are most likely to contribute to the failure should be ranked highest and tried first. Furthermore, one of the most important practical problems in delta debugging is inconsistent configurations. Since the original delta debugging builds the intermediate program ver- sions by using the structural differences between a succeeding and failing program version (e.g., changing one line or one character to generate an intermediate program version), it is likely that several resulting configurations are inconsistent - combinations of changes that do not result in a testable program. In addition, delta debugging treats all program changes as one flat atomic list. However, if we apply delta debugging to each level of configurations, a large por- tion of irrelevant changes might be pruned out earlier. Therefore, there is clearly scope to improve upon delta debugging. In this paper, we present a hybrid approach that involves both static and dynamic analysis techniques to automate the debugging process. Static analysis lends itself to obtaining generalized prop- erties from the program text, while dynamic analysis offers the se- mantics and ease of concrete program execution. The need and benefit to combine the two approaches has been repeatedly stated in the software engineering community [10,26]. More specifically, in locating the failure-inducing changes, a static analysis can soundly analyze an entire program and prune out all unrelated changes for a specific failure, while a dynamic analysis can avoid analysis ap-