PREDICTABILITY MEASURES FOR SOFTWARE RELIABILITY MODELS zy Yashwant K. Malaiya and Nachimuthu Karunanithi Computer Science Department Colorado State University, Fort Collins, CO zyxw 80523. Pradeep Verma Hewlett-Packard, Information Network Division 19420 Homestead Road, Cupertino, CA 95014. Abstract It is critical to be able to achieve an acceptable quality level before a software package is released. It is often impor- tant to meet a target release date. To be able to estimate the testing efforts required, it is necessary to use a zyxwvuts software reliability growth model. While several different software re- liability growth models have been proposed, there exist no clear guidelines about which model should be used. Here a twwcomponent predictability measure is presented that characterizes the long term predictability of a model. The first component, average predictability, measures how well a model predicts throughout the testing phase. The zyxwvuts sec- ond component, average bias, is a measure of the general tendency to overestimate or underestimate the number of faults. Data sets for both large and small projects from di- verse sources have been analyzed. Results presented here indicate that some models perform better than others in most cases. 1 INTRODUCTION A software product can be released only after some thresh- old reliability criterion has been satisfied. It is necessary to use some heuristics to estimate the required test time zyxwvut so that available resources can be efficiently apportioned. The most useful reliability criteria are residual fault density or the failure intensity zyxwvutsrqp ( or its inverse MTTF). One of the best approaches to determine the required testing time is to use a time based Software Reliability Growth Model (SRGM). In recent years researchers have proposed several different SRGMs. A comprehensive survey and classification of soft- ware reliability models can be found in [5,13]. There is evidence to suggest that different models have different prediction capabilities, specially during early phases of testing. This is the duration when better pre- dictability is required to estimate the release date and the additional test effort required. Hence selection of a par- ticular model can be important for a reliable estimate of reliability of software systems. Here five of the most commonly used fault count models are considered. All these models are two parameter models. This allows a fair comparison among the models. It was felt that these models do represent a sufficiently wide range of presumed behavior. All the models considered are NHPP (Non-Homogeneous Poisson Process) models with the ex- ception of inverse-polynom io1 model. The most common approach is to use a grouped data. The testing duration is divided into a number of periods. For each period, one item of the data set zyxw (ti, Xi}, or equiv- alently {ti, pi} is obtained. The major objective of using a model is to be able to estimate the time t~ when the fail- ure intensity X(~F) would have fallen below an acceptable threshold. Since the number of data points in a data set is often not large we have used the least squares method in our experiments. The mazimum likelyhood method has been found to perform similarly in this application [13]. Logarithmic Model (LOGM): This model was pro- posed by Musa and Okumoto [12]. Here the underlying software failure process has the characteristics of a loga- rithmic poiason process. It has an intensity function that decrease exponentially with the number of failures experi- enced. The mean value function and the failure intensity are [13]: p(t;P)=Poln(l +At) 4t; a) = * A(a; B) = BoSlezP(-f$. S(PO,P1) = C b , b r t -1nPoPl +W + P1r)12 It should be noted that the failure intensity can also be expressed as: The square of the sum of the errors, S is given by: where rl is the actual failure intensity at ti, calculated from the input data. Minimizing this expression results in 7 0730-3157/90/0000/0007$01.00 0 1990 IEEE