FUOYE Journal of Engineering and Technology, Volume 3, Issue 2, September 2018 ISSN: 2579-0625 (Online), 2579-0617 (Paper) FUOYEJET © 2018 50 http://dx.doi.org/10.46792/fuoyejet.v3i2.200 engineering.fuoye.edu.ng/journal Software Defect Prediction Using Ensemble Learning: An ANP Based Evaluation Method *Abdullateef O. Balogun, Amos O. Bajeh,Victor A. Orie and Ayisat W. Yusuf-Asaju Department of Computer Science, University of Ilorin, Nigeria {balogun.ao1|bajehamos|yusuf.aaw}@unilorin.edu.ng|orievictor123@gmail.com Abstract— Software defect prediction (SDP) is the process of predicting defects in software modules, it identifies the modules that are defective and require extensive testing. Classification algorithms that help to predict software defects play a major role in software engineering process. Some studies have depicted that the use of ensembles is often more accurate than using single classifiers. However, variations exist from studies, which posited that the efficiency of learning algorithms might vary using different performance measures. This is because most studies on SDP consider the accuracy of the model or classifier above other performance metrics. This paper evaluated the performance of single classifiers (SMO, MLP, kNN and Decision Tree) and ensembles (Bagging, Boosting, Stacking and Voting) in SDP considering major performance metrics using Analytic Network Process (ANP) multi-criteria decision method. The experiment was based on 11 performance metrics over 11 software defect datasets. Boosted SMO, Voting and Stacking Ensemble methods ranked highest with a priority level of 0.0493, 0.0493 and 0.0445 respectively. Decision tree ranked highest in single classifiers with 0.0410. These clearly show that ensemble methods can give better classification results in SDP and Boosting method gave the best result. In essence, it is valid to say that before deciding which model or classifier is better for software defect prediction, all performance metrics should be considered. Keywords— Data mining, Machine Learning, Multi Criteria Decision Making, Software Defect Prediction —————————— ◆ —————————— 1 INTRODUCTION oftware engineering is an engineering discipline that is concerned with all aspects of producing software from the early stages of software specification through to maintaining the system after it has gone into use (Lan, 2009). In any area of software engineering, errors are mostly inescapable and they can lead to defects in software. Usually, during the development process, software defects are discovered during software testing (Hui, 2014). A software defect is an error or flaw in a software program or system that causes the production of an unwanted result. A software defect can also be the case when the final software product does not meet the customer requirement or user expectation (Aruna, Radhika, & Swathi, 2016). Defects can increase the cost of software development and decrease the overall quality of the software product. Over the years, researchers have developed classification models for the prediction of defects in software. Some studies showed that the use of ensemble methods are better than single classifiers in software defect prediction (Yi, Gang, Guoxun, Wenshuai, & Yong, 2011; Lessman, Baesans, Meus, & Pietsch, 2008), while some other works indicated that single classifiers perform better (Bowes, Hall & Petrić, 2017; Aleem, Capretz & Ahmed, 2015). This study is aimed at evaluating the performance of ensemble and classification models using Analytic Network Process (ANP) which is a multi-criteria decision-making technique. The rest of this paper is organized as follows: Section 2 presents a review of related works. Section 3 discusses the theoretical background of the study. Thus, it presents the classifiers, feature selection method, ensemble methods and ANP. Section 4 presents the research method used in the experiment and analyzes the results. Section 5 presents results and discussion. Section 6 concludes the paper and presents some recommendations based on the results of the study. * Corresponding Author 2 RELATED WORKS A lot of work has been carried out on software defect prediction; this section highlights research work involving defect prediction, feature selection, ensemble and Multi-criteria decision-making (MCDM). Aleem, Capretz & Ahmed (2015) in their study, covered different machine learning methods that can be used for defect prediction. The performance of different algorithms on various software datasets was analyzed. SVM and MLP techniques performed well on bug’s datasets. In order to select the appropriate method for bug’s prediction domain experts have to consider various factors such as the type of datasets, problem domain, uncertainty in datasets or the nature of the project. Feature selection has also been applied by researchers to software defect prediction. Ghotra, McIntosh, & Hassan, (2017) studied 30 feature selection techniques and 21 classification techniques when applied to 18 datasets from the NASA and PROMISE corpora. Their results showed that a correlation-based filter-subset feature selection technique with a BestFirst search method outperforms other feature selection techniques across the studied datasets and across the studied classification techniques. They recommended the application of such a selection technique when building defect classification models. Issam, Mohammad, & Lahouari, (2014), depicted the outcome of combining feature selection and ensemble learning on the performance of defect classification. They combined selected ensemble learning models with efficient feature selection on the datasets based on defect classification performance measures, the results of their study showed that features of a software defect dataset must be carefully selected for precise classification of defective modules. In another study, Yi et al. (2010) incorporated a set of MCDM methods to rank classification algorithms, the study used four MCDM methods to rank 38 classification algorithms based on 13 evaluation criteria S