AJR:184, April 2005 1057
AJR 2005;184:1057–1064
0361–803X/05/1844–1057
© American Roentgen Ray Society
Research
Statistical Inference for Proportions
Joseph and Reinhold
Statistical Inference for Proportions
Fundamentals of Clinical Research
for Radiologists
Lawrence Joseph
1,2
Caroline Reinhold
3,4
Lawrence Joseph3,4 and Caroline Reinhold
Received November 5, 2004; accepted after revision
November 10, 2004.
Series editors: Nancy Obuchowski, C. Craig Blackmore,
Steven Karlik, and Caroline Reinhold.
This is the 16th in the series designed by the American
College of Radiology (ACR), the Canadian Association of
Radiologists, and the American Journal of Roentgenology.
The series, which will ultimately comprise 22 articles, is
designed to progressively educate radiologists in the
methodologies of rigorous clinical research, from the most
basic principles to a level of considerable sophistication.
The articles are intended to complement interactive
software that permits the user to work with what he or she
has learned, which is available on the ACR Web site
(www.acr.org).
Project coordinator: G. Scott Gazelle, Chair, ACR
Commission on Research and Technology Assessment.
Staff coordinator: Jonathan H. Sunshine, Senior Director
for Research, ACR.
1
Division of Clinical Epidemiology, Montreal General
Hospital, Department of Medicine, 1650 Cedar Ave.,
Montreal, QC H3G 1A4, Canada.
2
Department of Epidemiology and Biostatistics, McGill
University, 1020 Pine Ave. W, Montreal, QC H3A 1A2,
Canada. Address correspondence to L. Joseph
(Lawrence.Joseph@mcgill.ca).
3
Department of Diagnostic Radiology, Montreal General
Hospital, McGill University Health Centre, 1650 Cedar Ave.,
Montreal, QC H3G 1A4, Canada.
4
Synarc Inc., 575 Market St., San Francisco, CA 94105.
Statistical Inference for Proportions
his module will discuss the most
commonly used statistical proce-
dures when the parameters of interest
arrive in the form of proportions. Un-
derstanding these methods is especially important
to radiologists because so much radiologic re-
search and clinical work involves dichotomous
(e.g., yes or no, present or absent) outcomes sum-
marized as proportions. For example, a given dis-
ease or condition may be present or absent in any
given subject, and any time a diagnostic tool is
used, test characteristics such as sensitivity, spec-
ificity, and positive and negative predictive values
are all summarized as proportions.
We will continue to use the three basic meth-
ods for statistical inferences, including p values
and confidence intervals (CIs) from a frequentist
viewpoint, and posterior distributions leading to
credible intervals from a Bayesian viewpoint. We
will only briefly review the basic principles be-
hind these generic inferential principles, so read-
ers may wish to ensure they have a good
understanding of the previous module [1] in this
series before tackling this one. It may also be use-
ful to recall the basic properties of the binomial
distribution [2] because it is the central distribu-
tion used for inferences involving proportions.
We begin with inferences for single pro-
portions, which are covered in the next section.
Then we discuss inferences for two or more pro-
portions from independent groups, inferences for
dependent proportions, sample size determination
for studies involving one or two proportions, and
Bayesian methods for proportions. Finally, we will
summarize what we have learned in this module.
Inferences for Single Proportions
Standard Frequentist Hypothesis Testing
Suppose a new computer-aided automated
system for the detection of lung nodules on
chest radiographs has been developed [3].
Suppose further that one wishes to investigate
whether this new system provides improved
sensitivity compared with standard detection
via non-computer-aided methods of analyz-
ing chest radiographs. In other words, sup-
pose that chest radiographs are taken from a
series of subjects who all truly have lung nod-
ules, and we know that using standard (non-
computer-aided) methods 90% of them will
be found to have lung nodules and 10% of
these cases will be missed. Is there evidence
that the new computer-aided automated sys-
tem provides increased sensitivity compared
with the standard method of detection?
To look for evidence of improved sensitiv-
ity in the new automated system, we might
wish to test the null hypothesis (H
0
) that the
automated system is in fact not better than
standard detection, versus an alternative hy-
pothesis (H
A
) that it is better. Formally, we
can state these hypotheses as:
H
0
: p ≤ 0.9
H
A
: p > 0.9
where p represents the unknown true proba-
bility of success of the new automated system
in detecting lung nodules.
Suppose that we observe the results from 10
subjects with lung nodules, and all 10 test posi-
tively with the new automated system. Recalling
the correct definition of a p value [1] (it is the
probability of obtaining a result as extreme as or
more extreme than the result observed, given
that the null hypothesis is exactly correct), how
would we calculate the p value in this case? For
our example of the new automated technique,
the definition implies that we need to calculate
the probability of obtaining 10 (or more, but in
this case more than 10 is impossible) successful
T