Single and Multi Objective Genetic Programming for Software Development Effort Estimation Federica Sarro University of Salerno Via Ponte Don Melillo, 84084 Fisciano (SA), Italy {fsarro@unisa.it} Filomena Ferrucci University of Salerno Via Ponte Don Melillo, 84084 Fisciano (SA), Italy {fferrucci@unisa.it} Carmine Gravino University of Salerno Via Ponte Don Melillo, 84084 Fisciano (SA), Italy {gravino@unisa.it} ABSTRACT The idea of exploiting Genetic Programming (GP) to estimate software development effort is based on the observation that the effort estimation problem can be formulated as an optimization problem. Indeed, among the possible models, we have to identify the one providing the most accurate estimates. To this end a suitable measure to evaluate and compare different models is needed. However, in the context of effort estimation there does not exist a unique measure that allows us to compare different models but several different criteria (e.g., MMRE, Pred(25), MdMRE) have been proposed. Aiming at getting an insight on the effects of using different measures as fitness function, in this paper we analyzed the performance of GP using each of the five most used evaluation criteria. Moreover, we designed a Multi- Objective Genetic Programming (MOGP) based on Pareto optimality to simultaneously optimize the five evaluation measures and analyzed whether MOGP is able to build estimation models more accurate than those obtained using GP. The results of the empirical analysis, carried out using three publicly available datasets, showed that the choice of the fitness function significantly affects the estimation accuracy of the models built with GP and the use of some fitness functions allowed GP to get estimation accuracy comparable with the ones provided by MOGP. Categories and Subject Descriptors D.2.9 [Software Engineering]: Management - Cost Estimation. General Terms Management, Measurement. Keywords Genetic Programming, Multi Objective Search, Effort Estimation, Empirical Study. 1. INTRODUCTION Effort estimation is a critical activity for planning and monitoring software project development and for delivering the product on time and within budget. Several methods have been proposed to address the problem. In particular, data-driven approaches exploit data from past projects, consisting of both factor values that are related to effort and the actual effort to develop the projects, to construct an estimation model that is used to predict the effort for a new project under development [3][28]. In this class, we can include the search-based approaches [15]. These are meta- heuristics able to find optimal or near optimal solutions to problems characterized by large space that have turned out to be effective in solving numerous optimization problems in several contexts. Examples of search-based methods are Simulated Annealing, Tabu Search, Genetic Algorithms, and Genetic Programming [15]. The idea of exploiting these methods to estimate development effort is based on the observation that the effort estimation problem can be formulated as an optimization problem. As a matter of fact, among the possible estimation models, we have to identify the best one, i.e., the one providing the most accurate estimates. The investigations carried out so far on the use of search-based approaches for effort estimation have mainly focused on the use of Genetic Programming (GP) providing promising results [5][12][13][21][27]. Nevertheless, the design of these techniques deserves to be further explored and empirically analyzed. In particular, a crucial design choice is the definition of the fitness function which indicates how a solution is suitable for the problem under investigation driving the search towards optimal solutions. For the effort estimation problem the fitness function should be able to assess the accuracy of estimation models. It is worth noting that several different accuracy measures have been proposed for assessing the effectiveness/accuracy of effort prediction models. Among them the Mean Magnitude of Relative Error (MMRE) and the Prediction at level 25 (Pred(25)) represent the most widely used measures [8]. Each measure focuses the attention on a specific aspect, as a matter of fact “Pred(25) measures how well an effort model performs, while MMRE measures poor performance” [23]. It could be argued that the choice of the criterion for assessing predictions and establishing the best model can be a managerial issue: a project manager could prefer to use Pred(25) as the criterion for judging the quality of a model, while another might prefer to use another criterion, just for example MMRE [14]. On the other hand, in order to get a more reliable assessment of estimation methods, several evaluation criteria (i.e., MMRE, Pred(25), MdMRE, MEMRE) covering different aspects of models performances (e.g., underestimating or overestimating, success or poor performance) are usually jointly used [9][14][22]. From both points of view search-based methods represent an opportunity. Indeed, they let use as fitness function any measure able to evaluate some properties of interest [16], thus allowing a project manager to select his/her preferred accuracy measure so that the search for the model is driven by such a criterion. On the other end, some search-based techniques have been conceived to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM SAC’12, March 26-30, 2012, Riva del Garda, Italy. Copyright 2012 ACM 978-1-4503-0857-1/12/03…$10.00. 1221