Using a Class Abstraction Technique to Predict Faults in OO Classes A case study through six releases of the Eclipse JDT Djuradj Babich and Peter J. Clarke School of Computing and Information Sciences Florida International University Miami, FL 33199, USA {dbabi001, clarkep}@cis.fiu.edu James F. Power Department of Computer Science National University of Ireland Maynooth, Co. Kildare jpower@cs.nuim.ie B. M. Golam Kibria Department of Mathematics & Statistics Florida International University Miami, FL 33199, USA kibriag@fiu.edu ABSTRACT In this paper, we propose an innovative suite of metrics based on a class abstraction that uses a taxonomy for OO classes (CAT) to capture aspects of software complexity through combinations of class characteristics. We empiri- cally validate their ability to predict fault prone classes us- ing fault data for six versions of the Java-based open-source Eclipse Integrated Development Environment. We conclude that this proposed CAT metric suite, even though it treats classes in groups rather than individually, is as effective as the traditional Chidamber and Kemerer metrics in identify- ing fault-prone classes. 1. INTRODUCTION Since quantitative methods have significantly demonstrated their usefulness in other sciences, computer science researchers have worked hard to bring similar approaches to software de- velopment in the form of software metrics, a measure of some property of software code or its specifications. A plethora of OO design metrics has been proposed to help evaluate software design quality [4, 2, 6]. In order to demonstrate the usefulness of a metric during development of commer- cial applications, numerous empirical validations have also been performed and published within the literature. Many empirical studies on software metrics have linked quantita- tive design structures in OO designs to fault-proneness of classes [9, 10, 13], and most studies consistently use the Chidamber and Kemerer (CK) metric suite as a touchstone in predicting OO software quality [6]. Consequently, we use CK metrics as a comparison tool in this study. In this paper, we present similar work from the empir- ical validation standpoint, but with an innovative suite of metrics based on a Class Abstraction that uses a Taxonomy for OO classes (CAT) to capture aspects of software com- plexity through combinations of class characteristics. The Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’11 March 21-25, 2011, TaiChung, Taiwan. Copyright 2011 ACM 978-1-4503-0113-8/11/03 ...$10.00. advantage of our approach is that it significantly reduces the analysis overhead by grouping classes based on these characteristics. A potential disadvantage of our approach is that important information may be lost by using taxonomy entries rather than full analysis data. We empirically validate the ability of the CAT metrics to predict fault-prone classes using fault data for six versions of the Java-based open-source Eclipse Integrated Development Environment (IDE) [17]. We conclude that our proposed metric suite and CK metrics both produce statistical mod- els that effectively identify fault-prone classes. Evaluating generated prediction models across six Eclipse IDE versions suggests that models generated from our proposed metrics are at least as effective in assessing the quality of OO classes as their CK metrics model counterpart. The reminder of the paper is organized as follows. The class abstraction using the taxonomy of OO classes is re- viewed in Section 2. Details of the empirical study are given in Section 3 and the prediction models are described and analysed in Section 4. Finally, we present the related work in Section 5 and conclude in Section 6. 2. CLASS ABSTRACTION The class abstraction technique used in this paper is the taxonomy of OO classes previously described by Clarke et al. [7]; in this section we summarise its main elements. The taxonomy of OO classes is defined in terms of class characteristics. The class characteristics for a given class C are de- fined as the properties of the features (attributes and methods) in C and the dependencies C has with other types (built-in and user-defined) in the implementa- tion. The properties of the features in C describe how cri- teria such as types, accessibility, shared class features, polymorphism, dynamic binding, deferred features, ex- ception handling, and concurrency are represented in the attributes and routines of C. The dependencies C has with other types are real- ized through declarations and definitions of C ’s fea- tures, and C ’s role in an inheritance hierarchy. The artifact generated when a class is cataloged using the taxonomy is referred to as a cataloged entry.