Interpreting Diagnostic Test Accuracy Studies Patrick M.M. Bossuyt A central phase in the evaluation of a medical test is the assessment of its diagnostic accuracy: the degree to which the test results correspond to those of the clinical reference standard. There are several methods by which the results of diagnostic accuracy studies can be summarized, reported, and interpreted. Here we provide an overview and a critical commentary of three measures: error-based measures, information-based measures, and measures of the strength of the association. All of these measures may vary between studies, with changes in the definition of the target condition, the spectrum of disease, the setting, and the amount of prior testing. We discuss the relativity of the claim that likelihood ratios are a superior way of expressing diagnostic accuracy, and defend the use of the sometimes downgraded sensitivity and specificity. Semin Hematol 45:189-195 © 2008 Elsevier Inc. All rights reserved. A central phase in the evaluation of a medical test is the assessment of its diagnostic accuracy: its ability to dis- tinguish between patients with and patients without disease or, more generally, between those with and without the target condition. 1 There are several ways in which the results of studies to evaluate the diagnostic accuracy of a test can be summarized, reported, and interpreted. This article presents an overview and a critical commentary of the available mea- sures. It tries to do so from the perspective of decision-mak- ing, in which accuracy studies must provide the evidence to guide decisions about approval for marketing and purchas- ing, decisions regarding whether or not to include a test in practice guidelines, and decisions about test ordering and interpretation in individual patients. The first part of this presentation summarizes existing mea- sures for reporting accuracy studies. The second offers a critical analysis. The use of the sometimes downgraded sensitivity and specificity is defended, and the relativity of the claim that likeli- hood ratios are a superior way of expressing diagnostic accuracy is discussed. The final section discusses how diagnostic tests accuracy should be looked at in an appraisal of the health ben- efits that testing can bring about. Diagnostic Accuracy Studies In studies of diagnostic accuracy, the outcomes from one or more tests are compared with outcomes of the reference stan- dard in the same study participants. The clinical reference standard is the best available method to establish the pres- ence of the target condition in patients. The target condition can be a target disease, a disease stage, or some other condi- tion that qualifies patients for a particular form of manage- ment. The reference standard can be a single test, a series of tests, a panel-based decision, or some other procedure. 2 For simplicity, we will assume that the results of the test can be classified as positive, pointing to the presence of disease, or negative. We also assume that the target condition is either present or absent, and the clinical reference standard is able to identify it in all patients. Figure 1A shows the basic structure of a typical diagnostic test accuracy study. Figure 1B offers an example of a (hypo- thetical) diagnostic test accuracy study of a qualitative D-dimer assay to help identify patients with pulmonary embolism (PE) in an emergency department setting. Contrast-enhanced helical computed tomography (CT) imaging was used as the reference standard. Table 1 shows the results of this study in a 2 2 table. Table 2 shows a number of measures that can be cal- culated from the data, grouped in three categories. These three groups of measures are discussed in the following sec- tions. Error-Based Measures Accuracy refers to the quality of the diagnostic classification by the test under evaluation: its ability to correctly identify diseased patients as such. One of the measures of diagnostic accuracy is the overall fraction correct, sometimes also re- ferred to as simple “accuracy.” In the example in Table 1, 42% of the study patients were correctly classified by the D-dimer test. Department of Clinical Epidemiology and Biostatistics, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands. Address correspondence to Patrick M.M. Bossuyt, PhD, Department of Clin- ical Epidemiology and Biostatistics, Academic Medical Center, Univer- sity of Amsterdam, Room J1b-214, PO Box 22700, 1100 DE Amsterdam, the Netherlands. E-mail: p.m.bossuyt@amc.uva.nl 189 0037-1963/08/$-see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1053/j.seminhematol.2008.04.001