Empirical comparison of three metrics suites for fault prediction in packages of object-oriented systems: A case study of Eclipse Mahmoud O. Elish a,⇑ , Ali H. Al-Yafei b , Muhammed Al-Mulhem a a Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia b Arabian Advanced Systems, Al Khobar 31952, Saudi Arabia article info Article history: Received 24 January 2011 Received in revised form 12 April 2011 Accepted 2 June 2011 Available online 29 June 2011 Keywords: Software metrics Fault prediction Object-oriented packages Object-oriented systems Prediction models Software quality abstract Packages are important high-level organizational units for large object-oriented systems. Package-level metrics characterize the attributes of packages such as size, complexity, and coupling. There is a need for empirical evidence to support the collection of these metrics and using them as early indicators of some important external software quality attributes. In this paper, three suites of package-level metrics (Martin, MOOD and CK) are evaluated and compared empirically in predicting the number of pre-release faults and the number of post-release faults in packages. Eclipse, one of the largest open source systems, is used as a case study. The results indicate that the prediction models that are based on Martin suite are more accurate than those that are based on MOOD and CK suites across releases of Eclipse. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction As object-oriented systems grow in size and complexity, their classes will become too finely grained to be used as the sole orga- nizational unit for them [11]. Therefore, there is a need for pack- ages as mechanism for organizing classes into namespaces. A package is a basic development unit that can be separately created, maintained, released, tested, and assigned to a team [6]. A Java package is defined as follow [12]: ‘‘A java package is a group of classes that are related by purpose or by application. Classes in the same package have special access privilege with respect to one another and may be designed to work together closely.’’ The importance of packages has been recognized by software develop- ers, and they become a fundamental part of modern object-ori- ented programming languages [9]. Studying and analyzing packages in object-oriented systems in order to evaluate the quality of these systems is becoming increas- ingly important as the object-oriented paradigm continues to in- crease in popularity, the size and number of packages of these systems increases [7,11,13]. Consequently, several package-level metrics have been proposed and used to characterize the attributes of packages in object-oriented systems. Many of these attributes have relation, in one way or the other, with the quality of the soft- ware system being produced. However, some of these metrics may or may not really measure the intended quality attributes of soft- ware. Thus, empirical validation is necessary to demonstrate the usefulness of these metrics in practical applications [2], i.e. explore the relationships between these metrics and some important external software quality attributes. This paper explores the relationships between three suites of package-level metrics and the numbers of pre-release faults and post-release faults in packages of object-oriented systems. It also empirically evaluates and compares the predictive power of the three metrics suites for pre-release and post-release fault predic- tion through a case study of Eclipse, which is one of the largest open source systems. The paper therefore contributes interesting and novel empirical evidence-based insights into early prediction of faults in packages of object-oriented systems. This can help soft- ware developers to focus their quality assurance activities (inspec- tion, testing, refactoring, etc.) and to allocate the needed resources for these activities more effectively and efficiently. The rest of this paper is organized as follows. Section 2 defines the three suites of package-level metrics under investigation. Sec- tion 3 describes the empirical comparison study with discussion and analysis of the results. Section 4 reviews related work. Sec- tion 5 concludes the paper and provides directions for future work. 2. The three package-level metrics suites Three popular suites of metrics were investigated in this study: Martin suite [11], MOOD suite [1,10] and CK suite [3,4], which are described next. 0965-9978/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.advengsoft.2011.06.001 ⇑ Corresponding author. E-mail addresses: elish@kfupm.edu.sa (M.O. Elish), ali.alsaadi2@gmail.com (A.H. Al-Yafei), mulhem@kfupm.edu.sa (M. Al-Mulhem). Advances in Engineering Software 42 (2011) 852–859 Contents lists available at ScienceDirect Advances in Engineering Software journal homepage: www.elsevier.com/locate/advengsoft