International Journal of Computer Applications (0975 – 8887) Volume 74– No.8, July 2013 5 Cross Company and within Company Fault Prediction using Object Oriented Metrics Pradeep Singh National Institute of Technology Shrish Verma National Institute of Technology O P Vyas Indian Institute of information Technology ABSTRACT This paper investigates fault predictions in the cross-project context focusing on the object oriented metrics for the organizations that do not track fault related data. In this study, empirical analysis is carried out to validate object-oriented Chidamber and Kemerer (CK) design metrics for cross project fault prediction. The machine learning techniques used for evaluation are J48, NB, SVM, RF, K-NN and DT. The results indicate CK metrics can be used as initial guideline for the projects where no previous fault data is available. Overall, the results of cross company is comparable to the within company data learning. Our analysis is in favour of reusability in object oriented technology and it has been empirically shown that object oriented metric data can be used for cross company fault prediction in initial stage when previous fault data of the project is not available. Keywords Fault prediction, cross company, Software metric, open source software. 1. INTRODUCTION Modern software engineering and programming practices stress to develop similar products using reusable core assets. Modularization or component-based software designs, architecture and implementation are the basic technique for building software with new or reusable parts. The general aim of reusability is to enhance quality and to minimize the development effort and time. Software reuse can result in substantial savings in the development costs as well as in development of low complexity end-products that are relatively small in size. In order to increase productivity and quality, organizations develops a module once, verifies that it functions correctly and properly, and then reuses it in different applications where the same functionality is required. In the age of open source development it is quite possible that organizations can also make use of reusable components from outside their organization than within the organization. The software reuse concept is probably the most significant part of object-oriented based information system development. It is hence reasonable to use such cross company data which are developed using object oriented methodology. According to Kitchenham et.al. [1] : β€œThe time required to collect enough data on past projects from within a company may be prohibitive. Collecting within-company data may take so long that technologies change and older projects do not represent current practice.” Object-oriented development methodology is greatly used in software industry and many design metrics of object-oriented programs have been proposed for fault prediction, but there is no cross company investigation has been reported so far. In this study, empirical analysis is carried out to validate object-oriented design metrics for cross project fault prediction .The Chidamber and Kemerer metrics suite is adopted to predict the faults in the projects using same and cross company data. We use CK metric suite from software developed by different organization, using different object oriented language. The machine learning techniques used for evaluation are statistical, J48, NB, SVM, RF, K-NN and DT. The result indicates that CK metrics can be used as initial guideline for the projects where no fault data is available. Overall, the results of cross company is comparable to the within company data learning. Software fault prediction using various techniques on software repository for predicting the fault-prone software modules is of a great interest among the software testing researchers and industry professionals for reducing the cost occurring in software testing. Researchers have used metric based classification for software components as fault-prone and non-fault-prone [6][7]. Researchers and engineers have used static design metrics of the programs for this purpose. Many researchers have explored issues like the relative merits of McCabes cyclomatic complexity, Halsteads software science measures, and lines of code counts for building fault predictors [6,7,16]. After object-oriented programming dominated software development, a vast variety of design metrics have been adapted for estimating the quality of object- oriented programs. Chidamber and Kemerer[4] introduced their OO design and complexity metrics and demonstrated the strong impact on software quality. The CK metrics suite invoked great enthusiasm among researchers and software engineers, and a great amount of empirical studies have been conducted to evaluate those metrics. In this study, data from the industry is used to analyze the relationships between CK metrics and faults in the OO programs. Metrics data can be computed by using automatic tools, but it is not so easy to collect bug data. In the present work, we try to reuse the fault data of one project to generated prediction model for another project. To achieve this, the metrics and bug data computed from C++ and Java projects, then selection of CK metric for both project are used to create fault prediction model. Fault prediction models focuses on predicting the fault-prone modules precisely and helps software manager and testers to allocate limited resources in testing and maintenance Studies on this issue, usually trained predictors from data of historical releases in the same project (i.e., faults distributional data and software metrics such as static code features, code change histories, and process metrics) and predicted faults in the upcoming releases, or reported the results of cross-validation on the same data set. To build a fault predictor we need to extract the fault and software code data from the software repositories of the same project that is, training data for the predictor. However, sometimes in real practice, such faulty chronological data is not always accessible, because either it does not yet exist due