Early Estimation of Defect Density Using an In-Process Haskell Metrics Model Mark Sherriff 1 , Nachiappan Nagappan 2 , Laurie Williams 1 , Mladen Vouk 1 1 North Carolina State University, Raleigh, NC 27695 {mssherri, lawilli3, vouk}@ncsu.edu 2 Microsoft Research, Redmond, WA 98052 nachin@microsoft.com ABSTRACT Early estimation of defect density of a product is an important step towards the remediation of the problem associated with affordably guiding corrective actions in the software development process. This paper presents a suite of in-process metrics that leverages the software testing effort to create a defect density prediction model for use throughout the software development process. A case study conducted with Galois Connections, Inc. in a Haskell programming environment indicates that the resulting defect density prediction is indicative of the actual system defect density. Categories and Subject Descriptors D.2.8 [Software Engineering]: Metrics - Performance measures, Process metrics, Product metrics. General Terms Measurement, Reliability. Keywords Empirical software engineering, multiple regression, software quality, Haskell. 1. INTRODUCTION In industry, actual defect density of a software system cannot be measured until it has been released in the field and has been used extensively by the end user. Actual defect density information as found by the end users becomes available too late in the software lifecycle to affordably guide corrective actions to software quality. It is significantly more expensive to correct software defects once they have reached the end user compared with earlier in the development process [3]. Software developers can benefit from an early warning of defect density. This early warning can be built from a collection of internal, in-process metrics that are correlated with actual defect density, an external measure. The ISO/IEC standard [16] states that “internal metrics are of little value unless there is evidence that they are related to some externally visible quality.” Some internal metrics, such as complexity metrics, have been shown to be useful as early indicators of externally-visible product quality [1] because they are related (in a statistically significant and stable way) to the field quality/reliability of the product. The validation of such internal metrics requires a convincing demonstration that (1) the metric measures what it purports to measure and (2) the metric is associated with an important external metric, such as field reliability, maintainability or fault-proneness [12]. Our research objective is to construct and validate a set of easy- to-measure in-process metrics that can be used to create a prediction model of an external measure of system defect density. To this end, we have created a metric suite we call the Software Testing and Reliability Early Warning metric (STREW) suite. Currently, there are two versions of STREW that have been developed to analyze an object-oriented language (STREW-Java or STREW-J) [20] and a functional programming language (STREW-Haskell or STREW-H) [23, 24]. In this paper, we present the results of an industrial case study designed to analyze the capabilities of the prediction model created by the STREW-H metrics suite. The project is an ASN.1 compiler created by Galois Connections, Inc., using the Haskell programming language. The remainder of the paper is organized as follows. Section 2 describes the background work, and Section 3 introduces the STREW metric suite. Section 4 discusses the industrial case study performed with the STREW-H metric suite. Section 5 presents our conclusions and future work. 2. BACKGROUND In prior research, software metrics have been shown to be indicators of the quality of software products. Structural object- orientation (O-O) measurements, such as those in the Chidamber- Kemerer (CK) O-O metric suite [8], have been used to evaluate and predict fault-proneness [1, 5, 6]. These O-O metrics can be a useful early internal indicator of externally-visible product quality [1, 25, 26]. The CK metric suite consists of six metrics: weighted methods per class (WMC), coupling between objects (CBO), depth of inheritance tree (DIT), number of children (NOC), response for a class (RFC) and lack of cohesion among methods (LCOM). Basili et al. [1] studied the fault-proneness in software programs using eight student projects. They observed that the WMC, CBO, DIT, NOC and RFC were correlated with defects while the LCOM was not correlated with defects. Further, Briand et al. [6] performed an industrial case study and observed the CBO, RFC, and LCOM to be associated with the fault-proneness of a class. A similar study done by Briand et al. [5] on eight student projects showed that classes with a higher WMC, CBO, DIT and RFC were more fault-prone while classes with more children (NOC) were less fault-prone. Tang et al. [26] studied three real time Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. A-MOST’05, May15–16, 2005, St. Louis, Missouri, USA. Copyright 2005 ACM 1-59593-115-5/00/0004…$5.00.