Interpreting Diagnostic Test Accuracy Studies
Patrick M.M. Bossuyt
A central phase in the evaluation of a medical test is the assessment of its diagnostic
accuracy: the degree to which the test results correspond to those of the clinical reference
standard. There are several methods by which the results of diagnostic accuracy studies
can be summarized, reported, and interpreted. Here we provide an overview and a critical
commentary of three measures: error-based measures, information-based measures, and
measures of the strength of the association. All of these measures may vary between
studies, with changes in the definition of the target condition, the spectrum of disease, the
setting, and the amount of prior testing. We discuss the relativity of the claim that likelihood
ratios are a superior way of expressing diagnostic accuracy, and defend the use of the
sometimes downgraded sensitivity and specificity.
Semin Hematol 45:189-195 © 2008 Elsevier Inc. All rights reserved.
A
central phase in the evaluation of a medical test is the
assessment of its diagnostic accuracy: its ability to dis-
tinguish between patients with and patients without disease
or, more generally, between those with and without the target
condition.
1
There are several ways in which the results of
studies to evaluate the diagnostic accuracy of a test can be
summarized, reported, and interpreted. This article presents
an overview and a critical commentary of the available mea-
sures. It tries to do so from the perspective of decision-mak-
ing, in which accuracy studies must provide the evidence to
guide decisions about approval for marketing and purchas-
ing, decisions regarding whether or not to include a test in
practice guidelines, and decisions about test ordering and
interpretation in individual patients.
The first part of this presentation summarizes existing mea-
sures for reporting accuracy studies. The second offers a critical
analysis. The use of the sometimes downgraded sensitivity and
specificity is defended, and the relativity of the claim that likeli-
hood ratios are a superior way of expressing diagnostic accuracy
is discussed. The final section discusses how diagnostic tests
accuracy should be looked at in an appraisal of the health ben-
efits that testing can bring about.
Diagnostic Accuracy Studies
In studies of diagnostic accuracy, the outcomes from one or
more tests are compared with outcomes of the reference stan-
dard in the same study participants. The clinical reference
standard is the best available method to establish the pres-
ence of the target condition in patients. The target condition
can be a target disease, a disease stage, or some other condi-
tion that qualifies patients for a particular form of manage-
ment. The reference standard can be a single test, a series of
tests, a panel-based decision, or some other procedure.
2
For
simplicity, we will assume that the results of the test can be
classified as positive, pointing to the presence of disease, or
negative. We also assume that the target condition is either
present or absent, and the clinical reference standard is able
to identify it in all patients.
Figure 1A shows the basic structure of a typical diagnostic
test accuracy study. Figure 1B offers an example of a (hypo-
thetical) diagnostic test accuracy study of a qualitative D-dimer
assay to help identify patients with pulmonary embolism (PE) in
an emergency department setting. Contrast-enhanced helical
computed tomography (CT) imaging was used as the reference
standard. Table 1 shows the results of this study in a 2 2
table. Table 2 shows a number of measures that can be cal-
culated from the data, grouped in three categories. These
three groups of measures are discussed in the following sec-
tions.
Error-Based Measures
Accuracy refers to the quality of the diagnostic classification
by the test under evaluation: its ability to correctly identify
diseased patients as such. One of the measures of diagnostic
accuracy is the overall fraction correct, sometimes also re-
ferred to as simple “accuracy.” In the example in Table 1,
42% of the study patients were correctly classified by the
D-dimer test.
Department of Clinical Epidemiology and Biostatistics, Academic Medical
Center, University of Amsterdam, Amsterdam, the Netherlands.
Address correspondence to Patrick M.M. Bossuyt, PhD, Department of Clin-
ical Epidemiology and Biostatistics, Academic Medical Center, Univer-
sity of Amsterdam, Room J1b-214, PO Box 22700, 1100 DE Amsterdam,
the Netherlands. E-mail: p.m.bossuyt@amc.uva.nl
189 0037-1963/08/$-see front matter © 2008 Elsevier Inc. All rights reserved.
doi:10.1053/j.seminhematol.2008.04.001