STATISTICAL DEVELOPMENTS AND APPLICATIONS
Diagnosing Tests: Using and Misusing Diagnostic
and Screening Tests STREINER DIAGNOSTIC TESTS
David L. Streiner
Baycrest Centre for Geriatric Care
Department of Psychiatry
University of Toronto
Tests can be used either diagnostically (i.e., to confirm or rule out the presence of a condition in
people suspected of having it) or as a screening instrument (determining who in a large group of
people has the condition and often when those people are unaware of it or unwilling to admit to
it). Tests that may be useful and accurate for diagnosis may actually do more harm than good
when used as a screening instrument. The reason is that the proportion of false negatives may be
high when the prevalence is high, and the proportion of false positives tends to be high when the
prevalence of the condition is low (the usual situation with screening tests). My first aim of this
article is to discuss the effects of the base rate, or prevalence, of a disorder on the accuracy of
test results. My second aim is to review some of the many diagnostic efficiency statistics that
can be derived from a 2 × 2 table, including the overall correct classification rate, kappa, phi, the
odds ratio, positive and negative predictive power and some variants of them, and likelihood ra-
tios. In the last part of this article, I review the recent Standards for Reporting of Diagnostic Ac-
curacy guidelines (Bossuyt et al., 2003) for reporting the results of diagnostic tests and extend
them to cover the types of tests used by psychologists.
Within the past few years, diagnostic and screening tests have
been the focus of many articles in the popular press. On one
hand, some governments and blood-collection agencies have
been criticized and sued for not adequately screening blood
and blood products for HIV and hepatitis. On the other hand,
recent meta-analyses have cast doubt on the usefulness of
both breast self-examination (Baxter & the Canadian Task
Force on Preventive Health Care, 2001) and mammography
(Olsen & Gotzsche, 2001) in younger women for preventing
breast cancer, and a court decision has thrown out the poly-
graph, or lie detector, as evidence in criminal cases (Commit-
tee to Review the Scientific Evidence on the Polygraph,
2003; United States v. Scheffer, 1998). These reports have
generated considerable uncertainty and confusion, and give
rise to four questions: (a) What is the difference between di-
agnostic and screening tests?, (b) Under what circumstances
are each of them useful?, (c) When can they do more harm
than good?, and (d) What should be the minimum criteria for
reporting studies about tests?
Diagnostic and screening tests are similar in that they are
used to detect the presence or absence of some attribute in
people. In some cases, the question is to determine how much
of the attribute a person has (e.g., aptitude and intelligence
tests, university or graduate school admissions exams),
whereas in clinical settings, the people are often either un-
aware of whether they have it (e.g., tuberculosis or
Tay-Sachs disease) or may be unwilling to admit its presence
(e.g., using illicit drugs or having passed secrets to foreign
governments). The difference between them depends on the
way they are used and not how they are developed. Diagnos-
tic tests are used when the person is suspected of having the
attribute, and the purpose is to confirm this or rule it out,
whereas screening tests, as the name implies, are given more
broadly, primarily to large groups of asymptomatic people in
which the aim is to determine which (if any) of them have the
attribute in question.
1
The groups who are given screening
JOURNAL OF PERSONALITY ASSESSMENT, 81(3), 209–219
Copyright © 2003, Lawrence Erlbaum Associates, Inc.
1
The term screening test can be used to denote a briefer version of
a diagnostic test, such as the Brief Symptom Inventory (Derogatis &
Spencer, 1982). In this article, the term refers only to the way a test is
used—that is, for screening purposes—and not to its length.