On the Role of Diversity Measures for Multi-objective Test Case Selection Andrea De Lucia 1 , Massimiliano Di Penta 2 , Rocco Oliveto 3 , Annibale Panichella 1 1 University of Salerno, via Ponte don Melillo, Fisciano (SA), 84084 Salerno, Italy 2 University of Sannio, Palazzo ex Poste, Via Traiano, 82100 Benevento, Italy 3 University of Molise, Contrada Fonte Lappone, 86090 Pesche (IS), Italy Emails: adelucia@unisa.it, dipenta@unisannio.it, rocco.oliveto@unimol.it, apanichella@unisa.it Abstract—Test case selection has been recently formulated as multi-objective optimization problem trying to satisfy conflict- ing goals, such as code coverage and computational cost. This paper introduces the concept of asymmetric distance preserving, useful to improve the diversity of non-dominated solutions produced by multi-objective Pareto efficient genetic algorithms, and proposes two techniques to achieve this objective. Results of an empirical study conducted over four programs from the SIR benchmark show how the proposed technique (i) obtains non-dominated solutions having a higher diversity than the previously proposed multi-objective Pareto genetic algorithms; and (ii) improves the convergence speed of the genetic algorithms. Keywords-Search-based Software Testing; Test Case Selec- tion; Niched Genetic Algorithms; Empirical Studies. I. I NTRODUCTION Regression testing is the process to validate modified soft- ware for detecting whether new errors have been introduced into unchanged parts of software and to guarantee that the changed parts behave as intended. A complete re-testing of the changed system might be too expensive, especially if such a system is very large. Therefore it is crucial to perform (i) test case selection, i.e., to determine a subset of test cases that are able to satisfy a given testing adequacy criterion, and (ii) test case prioritization, i.e., to rank test cases with the purpose of first executing those having the highest likelihood of revealing faults. During past and recent years, several techniques for test case selection and prioritization have been proposed [1], [2], [3], [4], [5], [6], [7]. Regression testing should pursue two contrasting goals: (i) re-test the unchanged parts of the software system on the basis of the test requirements denoted by a test criterion; and (ii) reduce the regression testing cost, i.e., the number of test cases to execute. Most of the existing approaches (see e.g., [1] [5]) have been developed by considering one single-objective only (e.g., test suite minimization), while fixing a constraint on the other objectives (e.g., test adequacy). Recently, Yoo and Harman [8] treated the problem of test case selection as a Pareto-efficient multi-objective op- timization problem, considering cost and coverage as two, conflicting objectives. Specifically, they applied two multi- objective search-based optimization techniques, the Non- Dominating Sorting Genetic Algorithm (NSGA-II) [9] and an island GA variant of NSGA-II, named vNSGA-II. An empirical study indicated that, in some cases, the search- based multi-objective approach was able to outperform the previously proposed greedy approach [4], and that greedy and multi-objective approaches can be combined to achieve better solutions. This paper—building upon the work of Yoo and Harman—aims at enhancing the vNSGA-II algorithm by increasing population diversity in the obtained Pareto fronts. Indeed, when solving a multi-objective problem using GA, there is the risk that solutions are biased towards the solu- tions of the sub-problems, and, in the specific case of multi- objective GA, towards the creation of a limited number of groups of solutions (niches). This would likely compromise the quality (in terms of high code coverage and low testing cost) of the produced solutions. To mitigate such a problem, we propose two approaches, aimed at ensuring population diversity in vNSGA-II. The first approach is based on fitness sharing, which aims at penalizing solutions in crowded areas [10], while the second approach aims at partitioning the Pareto front and applying a density function to ensure a uniform distribution of solutions over the various partitions. The standard fitness sharing is customized for test case selection problem encouraging the diversity for only one of the objective functions (i.e., only for coverage but not for cost) while the density function on coverage space is an alternative asymmetric distance preserving mechanism introduced for the first time in this paper for reaching the same goal in a different way. The benefits provided by the two proposed approaches for solution diversity have been evaluated on four pro- grams from the Siemens benchmark 1 , namely printtokens, printtokens2, schedule and schedule2. Specifically, we have compared the vNSGA-II algorithm proposed by Yoo and Harman with the fitness sharing vNSGA-II and the density based vNSGA-II. Results indicate that the two variants of vNSGA-II proposed in this paper outperform the original version of vNSGA-II in terms of convergence speed and optimality of the achieved solutions. 1 Available at http://esquared.unl.edu/sir/. 978-1-4673-1822-8/12/$31.00 c 2012 IEEE AST 2012, Zurich, Switzerland 145