Time Will Tell: Fault Localization Using Time Spectra Cemal Yilmaz, Amit Paradkar, and Clay Williams IBM T. J. Watson Research Center Hawthorne, NY 10532 {cyilmaz, paradkar, clayw}@us.ibm.com ABSTRACT We present an automatic fault localization technique which leverages time spectra as abstractions for program execu- tions. Time spectra have been traditionally used for perfor- mance debugging. By contrast, we use them for functional correctness debugging by identifying pieces of program code that take a “suspicious” amount of time to execute. The approach can be summarized as follows: Time spectra are collected from passing and failing runs, observed behavior models are created using the time spectra collected from passing runs, and deviations from these models in failing runs are identified and scored as potential causes of fail- ures. Our empirical evaluations conducted on three real-life projects suggest that the proposed approach can effectively reduce the space of potential root causes for failures, which can in turn improve the turn around time for fixes. Categories and Subject Descriptors D.2.5 [Testing and Debugging]: Debugging aids General Terms Measurement, Experimentation, Reliability Keywords Fault localization, Automated debugging 1. INTRODUCTION Program debugging is a process of identifying and fixing bugs. Identifying the root causes is the hardest, thus the most expensive, component of debugging. Developers often take a slice of the statements involved in a failure, hypoth- esize a set of potential causes in an ad hoc manner, and it- eratively verify and refine their hypotheses until root causes are located. Obviously, this process can be quite tedious and time-consuming. Many approaches have been proposed in the past to fa- cilitate fault localization. They all have the same ultimate Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE’08, May 10–18, 2008, Leipzig, Germany. Copyright 2008 ACM 978-1-60558-079-1/08/05 ...$5.00. goal to narrow down the space of potential root causes for developers, but a different way to achieve it. Perhaps, the most eagerly studied type of approach is pro- gram spectrum-based approaches. A program spectrum can be considered as an abstraction of a program execution. Var- ious forms of spectra can be defined. Statements executed, branches covered, and call sequences observed in executions are just few examples. All program spectrum-based approaches operate in the same way: Program spectra are collected from passing and failing runs, models that capture the behavior of the pro- gram as observed in passing runs are created, and then de- viations from these models in failing runs are identified and scored as potential causes of failures. The fundamental assumption behind this approach is that the “observed behavior” (as observed from passing runs) is the same with respect to the abstracted feature as the “cor- rect behavior” of the program (as documented in require- ment specifications), or at least a safe subset of the correct behavior. Therefore, any deviation from the observed be- havior is considered as a deviation from the correct behav- ior, even if it might not always be the case. Although a highly debatable assumption, many studies suggest that it may hold in practice [15, 14, 10, 4]. Our personal experience also supports this assumption. We have often observed that comparing passing and failing runs helps developers pinpoint the root causes of failures. On the other hand, we also noticed that leveraging an ad- equate type of program spectrum is the key to the success, since it reduces the gap between the observed and the cor- rect behavior models. To this end, we believe that the types of program spectra currently in use today suffer from some major limitations. One common limitation is that the existing spectra focus only on a very specific feature of program executions and collect precise information about that feature. Although the resulting abstractions are good at answering queries on the monitored feature, they cannot answer queries on even slightly different features. For example, statement coverage information would tell which statements were executed in a run, but cannot tell if two statements in a method were ever executed together. Another limitation is that existing spectra often do not deal with sequences of events that happen in executions. Even the ones that capture some form of sequence infor- mation limit themselves to sequences of up to a certain length [14] because of the combinatorial complexity involved in analyzing sequences. Since a program execution is a se- 81