AJR:184, April 2005 1057 AJR 2005;184:1057–1064 0361–803X/05/1844–1057 © American Roentgen Ray Society Research Statistical Inference for Proportions Joseph and Reinhold Statistical Inference for Proportions Fundamentals of Clinical Research for Radiologists Lawrence Joseph 1,2 Caroline Reinhold 3,4 Lawrence Joseph3,4 and Caroline Reinhold Received November 5, 2004; accepted after revision November 10, 2004. Series editors: Nancy Obuchowski, C. Craig Blackmore, Steven Karlik, and Caroline Reinhold. This is the 16th in the series designed by the American College of Radiology (ACR), the Canadian Association of Radiologists, and the American Journal of Roentgenology. The series, which will ultimately comprise 22 articles, is designed to progressively educate radiologists in the methodologies of rigorous clinical research, from the most basic principles to a level of considerable sophistication. The articles are intended to complement interactive software that permits the user to work with what he or she has learned, which is available on the ACR Web site (www.acr.org). Project coordinator: G. Scott Gazelle, Chair, ACR Commission on Research and Technology Assessment. Staff coordinator: Jonathan H. Sunshine, Senior Director for Research, ACR. 1 Division of Clinical Epidemiology, Montreal General Hospital, Department of Medicine, 1650 Cedar Ave., Montreal, QC H3G 1A4, Canada. 2 Department of Epidemiology and Biostatistics, McGill University, 1020 Pine Ave. W, Montreal, QC H3A 1A2, Canada. Address correspondence to L. Joseph (Lawrence.Joseph@mcgill.ca). 3 Department of Diagnostic Radiology, Montreal General Hospital, McGill University Health Centre, 1650 Cedar Ave., Montreal, QC H3G 1A4, Canada. 4 Synarc Inc., 575 Market St., San Francisco, CA 94105. Statistical Inference for Proportions his module will discuss the most commonly used statistical proce- dures when the parameters of interest arrive in the form of proportions. Un- derstanding these methods is especially important to radiologists because so much radiologic re- search and clinical work involves dichotomous (e.g., yes or no, present or absent) outcomes sum- marized as proportions. For example, a given dis- ease or condition may be present or absent in any given subject, and any time a diagnostic tool is used, test characteristics such as sensitivity, spec- ificity, and positive and negative predictive values are all summarized as proportions. We will continue to use the three basic meth- ods for statistical inferences, including p values and confidence intervals (CIs) from a frequentist viewpoint, and posterior distributions leading to credible intervals from a Bayesian viewpoint. We will only briefly review the basic principles be- hind these generic inferential principles, so read- ers may wish to ensure they have a good understanding of the previous module [1] in this series before tackling this one. It may also be use- ful to recall the basic properties of the binomial distribution [2] because it is the central distribu- tion used for inferences involving proportions. We begin with inferences for single pro- portions, which are covered in the next section. Then we discuss inferences for two or more pro- portions from independent groups, inferences for dependent proportions, sample size determination for studies involving one or two proportions, and Bayesian methods for proportions. Finally, we will summarize what we have learned in this module. Inferences for Single Proportions Standard Frequentist Hypothesis Testing Suppose a new computer-aided automated system for the detection of lung nodules on chest radiographs has been developed [3]. Suppose further that one wishes to investigate whether this new system provides improved sensitivity compared with standard detection via non-computer-aided methods of analyz- ing chest radiographs. In other words, sup- pose that chest radiographs are taken from a series of subjects who all truly have lung nod- ules, and we know that using standard (non- computer-aided) methods 90% of them will be found to have lung nodules and 10% of these cases will be missed. Is there evidence that the new computer-aided automated sys- tem provides increased sensitivity compared with the standard method of detection? To look for evidence of improved sensitiv- ity in the new automated system, we might wish to test the null hypothesis (H 0 ) that the automated system is in fact not better than standard detection, versus an alternative hy- pothesis (H A ) that it is better. Formally, we can state these hypotheses as: H 0 : p 0.9 H A : p > 0.9 where p represents the unknown true proba- bility of success of the new automated system in detecting lung nodules. Suppose that we observe the results from 10 subjects with lung nodules, and all 10 test posi- tively with the new automated system. Recalling the correct definition of a p value [1] (it is the probability of obtaining a result as extreme as or more extreme than the result observed, given that the null hypothesis is exactly correct), how would we calculate the p value in this case? For our example of the new automated technique, the definition implies that we need to calculate the probability of obtaining 10 (or more, but in this case more than 10 is impossible) successful T