Sample Size under Inverse Negative Binomial Group Testing for Accuracy in Parameter Estimation Osval Antonio Montesinos-Lo ´ pez 1 *, Abelardo Montesinos-Lo ´ pez 2 , Jose ´ Crossa 3 *, Kent Eskridge 4 1 Facultad de Telema ´tica, Universidad de Colima, Colima, Colima, Me ´ xico, 2 Departamento de Estadı ´stica, Centro de Investigacio ´ n en Matema ´ticas (CIMAT), Guanajuato, Guanajuato, Me ´xico, 3 Biometrics and Statistics Unit, Maize and Wheat Improvement Center (CIMMYT), Mexico D.F., Mexico, 4 Department of Statistics, University of Nebraska, Lincoln, Nebraska, United States of America Abstract Background: The group testing method has been proposed for the detection and estimation of genetically modified plants (adventitious presence of unwanted transgenic plants, AP). For binary response variables (presence or absence), group testing is efficient when the prevalence is low, so that estimation, detection, and sample size methods have been developed under the binomial model. However, when the event is rare (low prevalence ,0.1), and testing occurs sequentially, inverse (negative) binomial pooled sampling may be preferred. Methodology/Principal Findings: This research proposes three sample size procedures (two computational and one analytic) for estimating prevalence using group testing under inverse (negative) binomial sampling. These methods provide the required number of positive pools (r m ), given a pool size (k), for estimating the proportion of AP plants using the Dorfman model and inverse (negative) binomial sampling. We give real and simulated examples to show how to apply these methods and the proposed sample-size formula. The Monte Carlo method was used to study the coverage and level of assurance achieved by the proposed sample sizes. An R program to create other scenarios is given in Appendix S2. Conclusions: The three methods ensure precision in the estimated proportion of AP because they guarantee that the width (W) of the confidence interval (CI) will be equal to, or narrower than, the desired width (v), with a probability of c. With the Monte Carlo study we found that the computational Wald procedure (method 2) produces the more precise sample size (with coverage and assurance levels very close to nominal values) and that the samples size based on the Clopper-Pearson CI (method 1) is conservative (overestimates the sample size); the analytic Wald sample size method we developed (method 3) sometimes underestimated the optimum number of pools. Citation: Montesinos-Lo ´ pez OA, Montesinos-Lo ´ pez A, Crossa J, Eskridge K (2012) Sample Size under Inverse Negative Binomial Group Testing for Accuracy in Parameter Estimation. PLoS ONE 7(3): e32250. doi:10.1371/journal.pone.0032250 Editor: Ken R. Duffy, National University of Ireland Maynooth, Ireland Received August 16, 2011; Accepted January 25, 2012; Published March 22, 2012 Copyright: ß 2012 Montesinos-Lo ´ pez et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: These authors have no support or funding to report. Competing Interests: The authors have declared that no competing interests exist. * E-mail: oamontes1@ucol.mx (OM); j.crossa@cgiar.org (JC) Introduction To detect the presence of a rare event, thousands of individuals need to be tested, and the cost of such testing usually exceeds the available budget and staff. The pooling methodology (Dorfman method) was first proposed to save a significant amount of money when detecting soldiers with syphilis [1]. Significant cost savings were achieved by first testing a sample created by mixing blood from several people. If the sample tested positive, the blood from each individual in that pool would be retested; if the sample tested negative, all individuals in that pool were declared free of the disease [1]. Currently the Dorfman method is used for detecting and estimating the proportion of positive individuals in fields such as medicine [2,3,4,5], agriculture [6], telecommunications [7], and science fiction [8]. Most applications for detecting and estimating a proportion are developed using binomial sampling; however, Pritchard and Tebbs [9] have suggested that inverse (negative) binomial pooled sampling may be preferred when prevalence p is known to be small, when sampling and testing occur sequentially, or when positive pool results require immediate analysis—for example, in the case of many rare diseases. Unlike binomial sampling, in this model the number of positive pools to be observed is fixed a priori, and testing is complete when the rth positive pool is reached [10]. George and Elston [11] recommended using geometric sampling when the probability of an event is small; they gave confidence intervals for the prevalence based on individual testing. Also, according to Haldane [12], using a binomial distribution may not provide an unbiased and precise estimate of p when p is small (pƒ0:1). Lui [13] extended George and Elston’s work [11] on the confidence interval (CI) by considering negative binomial sampling and showed that as the required number of successes increased, the width of the CI decreased. However, this extension was also under individual testing. Using negative binomial group testing sampling, Katholi [14] derived point and interval estimators of p, obtained by both classical and Bayesian methods, and investigated their statistical properties. Recently Pritchard and Tebbs [9] used maximum likelihood as a basis for developing three point and interval estimators for p under inverse pooled sampling; they compared its performance PLoS ONE | www.plosone.org 1 March 2012 | Volume 7 | Issue 3 | e32250