How Many Individuals to Use in a QA Task with Fixed Total Effort? Mika V. Mäntylä Lund University Department of Computer Science 22100 Lund, Sweden mika.mantyla@cs.lth.se Kai Petersen Blekinge Institute of Technology School of Computing 37140 Karlskrona, Sweden kai.petersen@bth.se Dietmar Pfahl Lund University Department of Computer Science 22100 Lund, Sweden dietmar.pfahl@cs.lth.se ABSTRACT Increasing the number of persons working on quality assurance (QA) tasks, e.g., reviews and testing, increases the number of defects detected – but it also increases the total effort unless effort is controlled with fixed effort budgets. Our research investigates how QA tasks should be configured regarding two parameters, i.e., time and number of people. We define an optimization prob- lem to answer this question. As a core element of the optimization problem we discuss and describe how defect detection probability should be modeled as a function of time. We apply the formulas used in the definition of the optimization problem to empirical defect data of an experiment previously conducted with university students. The results show that the optimal choice of the number of persons depends on the actual defect detection probabilities of the individual defects over time, but also on the size of the effort budget. Future work will focus on generalizing the optimization problem to a larger set of parameters, including not only task time and number of persons but also experience and knowledge of the personnel involved, and methods and tools applied when perform- ing a QA task. Categories and Subject Descriptors K.6.3 [Software Management]: Software process General Terms Measurement, Economics, Human Factors, Management Keywords Effectiveness, Fixed effort Budget, Effort, Review, People 1. INTRODUCTION Given enough eyeballs, all bugs are shallow, is known as the Linus Law stated by Linus Torwalds [1]. The statement claims that if we increase the number of people performing quality assur- ance (QA) tasks we find an increasing number of bugs and if we have the possibility to add people endlessly finally all bugs will be found. Whether this statement is completely true is debatable. However, it illustrates the fact that using a larger group of people in a QA task increases the number of defects found in comparison with a smaller group. For example, data by Jones [2] indicates that beta-testing is the most effective QA measure when a high num- ber of sites is available (>1000). Furthermore, research shows that having large groups can be beneficial, e.g. in data of [3] from software inspections, we can see that the number of defects found increases when adding more inspectors even after 20 people. We witnessed in our previous research a similar pattern with manual software testing [4]. However, the problem with using large groups in QA tasks is the increasing personnel cost, but one can control this problem by limiting the effort budgets for QA tasks. The question to be an- swered when doing this how to divide the effort. For example, assume we have an effort budget of 10 person-hours for doing a software review. Then how many people should we use? Should we have one person working for ten hours or ten persons working one hour? Questions of this nature have received limited attention in the prior research on software testing and reviews, which fo- cused more on the different techniques and tools to use. In this paper, we continue our previous work [4] on understanding how many individuals to use in a QA task when having a fixed effort budget. In this paper, a QA task is any task where the pri- mary goal is to find faults in a product under scrutiny. Section 2 presents the relevant prior work. Then, in Section 3, we discuss implications and present extension based on prior work. Section 4 models defect detection as a function of time, by first formulating defect detection with fixed effort budget as an optimization prob- lem, and then applying this optimization problem to experimental data. Finally, Section 5 discusses the results and possible future work. Section 6 presents conclusions. 2. PRIOR WORK In prior work, Biffl et al. describe how inspection team perfor- mance can be statistically estimated from individual inspector performances [3, 5]. For example, assume we have performed an experiment A with 40 participants and 10 of them found a particu- lar defect d1. Then the detection probability for this defect is 0.25 on average for a single individual picked randomly from that population. Furthermore, if we pick two individuals then what follows from is that the detection probability for the particular defect is 0.4375 = 1 − (1 − 0.25) . We can also pick individuals from populations using different techniques and combine results as originally suggested by Biffl et al. This idea can be extended to other populations as well, e.g., ones having different time budgets, or having different experi- ence. In Section 4 of this paper we discuss the case of fixed time budgets. To illustrate the case of using different techniques, let us assume we perform an experiment B with 40 participants – but using a different technique than in experiment A – and this time 20 individuals find defect d1 suggesting an average detection probability of 0.5. Then, from this we can calculate the detection probability of a group consisting of one inspector from each popu- lation A and B as 0.625 = 1 − (1 − 0.25) ∗ (1 − 0.5) . In more formal terms, the probability P(d) that a group of size n finds a given defect d is calculated as follows: (1) () = 1 − ∏ (1 − ) ∈{,… ,} Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ESEM2012, Sep 17-18, 2012, Lund, Skåne, Sweden. Copyright 2010 ACM 1-58113-000-0/00/0010 …$15.00.