An Empirical Evaluation of Object Oriented Metrics in Industrial Setting Giovanni Denaro Universit ` a di Milano Bicocca Dip. di Informatica, Sistemistica e Comunicazione I-20126, Milano, Italy denaro@disco.unimib.it Luigi Lavazza CEFRIEL and Politecnico di Milano Dip. di Elettronica e Informazione I-20133, Milano, Italy lavazza@elet.polimi.it Mauro Pezz` e Universit ` a di Milano Bicocca Dip. di Informatica, Sistemistica e Comunicazione I-20126, Milano, Italy pezze@disco.unimib.it 1. INTRODUCTION Advances in distributed object technologies (e.g., the Com- mon Object Request Broker Architecture [15] and the Enter- prise Java Bean Specification [19]) dramatically impact the development process of distributed software applications. In particular, time for providing new distributed services is de- creasing because applications are not built from scratch any longer. Rather, they are developed based on pre-existing middle tier software (middleware) and integrate components and services provided off-the-shelf by third parties [9]. The increasing demand for rapid provision of new products en- tails rigid constraints on the activities of quality assurance for this class of applications. It becomes crucial to optimize the allocation of resources for testing and analysis to meet the required quality goals, while reducing time-to-market. Measuring the fault-proneness of the software may facili- tate the allocation of resources for testing and analysis. If the distribution of faults in the software can be accurately estimated in advance, resources can be allocated accord- ingly, i.e., more resources to the more fault-prone parts of the software. For example, in the case of code inspection, more thorough inspection sessions could be scheduled for the more fault-prone modules. Although fault-proneness cannot be directly measured, it can be estimated based on other measurable attributes of the software, based on expected correlations between such attributes and fault-proneness. Many software metrics have been proposed for this pur- pose (e.g., [14, 10, 20]). However, the best predictors of fault-proneness may vary according to the class of applica- tions and the target application domain, as demonstrated by many empirical studies [13, 18, 16, 17, 7, 6]. In the nineties, researchers started investigating software metrics to capture the specific complexity of object-oriented systems [5, 11, 4, 2]. Object-Oriented (OO) metrics capture characteristics of class hierarchies, of the internal cohesion This work has been partially founded by the Italian Gov- ernment in the context of the QUACK project. (QUACK: A Platform for the Quality of New Generation Integrated Embedded Systems.) of classes and of the degree of coupling between different classes. An preliminary set of empirical studies support the hypothesis that OO metrics are better related to external at- tributes of object-oriented systems, such as fault-proneness and maintainability, than traditional metrics [1, 8, 3]. How- ever, these studies have been conducted on small applica- tions, containing less than 100 classes, which may not ade- quately represent large-scale industrial systems. This paper reports an empirical study on a large object- oriented industrial system for telecommunications. The tar- get software consists of more than 2,000,000 lines of code organized in 3,344 modules, containing one or more C++ classes each. We analyzed the relationships between the OO metrics defined by Chidamber and Kemerer in [5] and fault-proneness across three different versions of this tar- get application. The result of the experiment suggests that OO metrics do not outperform traditional metrics as fault- proneness predictors for large industrial systems: in our ex- periments, none of the experimented OO metrics appears to be a better predictor of fault-proneness than lines of code (LOC). 2. EMPIRICAL STUDY The empirical study was conducted on an industrial telecom- munication application. For confidentiality reasons, we can- not disclose neither functional details nor the producer of this application. In what follows, we refer to this applica- tion as RC. 2.1 Target Data We collected data from three different versions of RC, referred to as RCx.01, RCx.02 and RCx+1.0, respectively. Such versions represent the evolution of the same software over time. At the time of this writing, all three versions operate in the field and all of them are maintained by the company as branching versions. All three versions are sig- nificantly different from each other in terms of offered func- tionality. Each version of RC consists of more than 2 million lines of C++ code. The set of source files common to the three ver- sions (common core files) accounts for 93.3% of the files in RCx.01, 92.9% in RCx.02 and 86.5% in RCx+1.0. This sug- gests that the evolution of RC largely favored modification and adaptation of available code over implementation of new functionality in separate modules. To the end of analyzing