Software Reliability Analysis Incorporating Fault Detection and Debugging Activities Swapna S. Gokhale Michael R. Lyu Kishor S. Trivedi Bourns College of Engg. Dept. of Computer Science & Engg. Dept. of Electrical Engg. University of California Chinese University of Hong Kong CACC, Duke University Riverside, CA 92521 Shatin NT, Hong Kong Durham, NC 27708 swapna@cs.ucr.edu lyu@cse.cukh.edu.hk kst@ee.duke.edu Abstract Software reliability measurement problem can be ap- proached by obtaining the estimates of the residual number of faults in the software. Traditional black-box based ap- proaches to software reliability modeling assume that the debugging process is instantaneous and perfect. The esti- mates of the remaining number of faults, and hence reliabil- ity, are based on these oversimplified assumptions and they tend to be optimistic. In this paper, we propose a framework relying on rate-based simulation technique for incorporat- ing explicit debugging activities along with the possibility of imperfect debugging into the black-box software reliability models. We present various debugging policies and analyze the effect of these policies on the residual number of faults in the software. In addition, we propose a methodology to compute the reliability of the software, taking into account explicit debugging activities. An economic cost model to de- termine the optimal software release criteria in the presence of debugging activities is described. Finally, we present the high-level architecture of a tool, called SRSIM, for the pur- pose of automating the simulation techniques described in this paper. 1 Introduction Software reliability is accepted as a key attribute in soft- ware quality, and is defined as the probability of failure- free software operation for a specified period of time in a This work was done when the author was a graduate student at Duke University Supported by the Direct Grant from the Chinese University of Hong Kong Supported by a contract from Charles Stark Draper Laboratory and in part by Bellcore as a core project in the Center for Advanced Computing and Communication specified environment [11]. The residual faults in the soft- ware system directly contribute to the failure rate, causing software unreliability. Therefore, the problem of measuring software reliability can be approached by obtaining the es- timates of the residual number of faults in the software. The number of faults that remain in the code is also an impor- tant measure for the software developer, from the point of view of planning maintenance activities. This is specially true for the developer of a commercial off-the-shelf soft- ware package that will run on thousands of individual sys- tems. The reliability of a commercial software is important to its users, however, the users never report their reliability experience. They report the occurrence of a specific fail- ure to the software development organization, with the pre- sumption of getting the underlying fault fixed, so that the failure does not recur. Thus commercial software organi- zations focus on the residual number of faults, rather than reliability as a measure of software quality [7]. A plethora of black-box software reliability models [4] have appeared in the literature, and most of them, assume that a software fault is fixed immediately upon detection, and no new faults are introduced during the debugging pro- cess. This assumption of instantaneous and perfect debug- ging is impractical [15], and should be amended in order to present more realistic testing scenarios. The time lag between the detection and debugging of a fault is not ex- plicitly accounted for in the traditional software reliability models, as it complicates the failure process significantly, making it impossible to obtain closed-form expressions for various metrics of interest. However, the estimates of the residual number of faults in the software is influenced not only by the detection process, but also by the time required to debug the detected faults. Debugging process thus affects the number of faults remaining in the software and conse- quently its reliability, and makes a direct impact on the qual- ity of a software product. The other stringent assumption is