J Clin Epidemiol Vol. 52, No. 3, pp. 229–235, 1999
Copyright © 1999 Elsevier Science Inc. All rights reserved.
0895-4356/99/ $–see front matter
PII S0895-4356(98)00168-1
Increasing Physicians’ Awareness of the Impact of
Statistics on Research Outcomes: Comparative Power of
the t-test and Wilcoxon Rank-Sum Test in Small
Samples Applied Research
Patrick D. Bridge
1,
* and Shlomo S. Sawilowsky
2
1
Department of Family Medicine, Wayne State University School of Medicine and
2
Department of Theoretical and
Behavioral Foundations, College of Education, Wayne State University, Detroit, Michigan
ABSTRACT. To effectively evaluate medical literature, practicing physicians and medical researchers must
understand the impact of statistical tests on research outcomes. Applying inefficient statistics not only increases
the need for resources, but more importantly increases the probability of committing a Type I or Type II error.
The t-test is one of the most prevalent tests used in the medical field and is the uniformally most powerful
unbiased test (UMPU) under normal curve theory. But does it maintain its UMPU properties when assumptions
of normality are violated? A Monte Carlo investigation evaluates the comparative power of the independent
samples t-test and its nonparametric counterpart, the Wilcoxon Rank-Sum (WRS) test, to violations from
population normality, using three commonly occurring distributions and small sample sizes. The t-test was more
powerful under relatively symmetric distributions, although the magnitude of the differences was moderate.
Under distributions with extreme skews, the WRS held large power advantages. When distributions consist of
heavier tails or extreme skews, the WRS should be the test of choice. In turn, when population characteristics are
unknown, the WRS is recommended, based on the magnitude of these power differences in extreme skews, and
the modest variation in symmetric distributions. J CLIN EPIDEMIOL 52;3:229–235, 1999. © 1999 Elsevier Science Inc.
KEY WORDS. Research methods; t-test; Wilcoxon Rank-Sum test; nonparametric statistics; parametric statistics;
power
INTRODUCTION
The use of statistics in medical research has increased con-
siderably in past 60 years, and the types of statistics used
have become much more complex [1]. Concomitant with
this influx in the use of statistics, unfortunately, was an in-
crease in statistical errors [2–12]. For a practicing physician
trying to stay current with the literature, or a medical re-
searcher adding new knowledge to the field, it is important
to understand the application and efficiency of statistical
tests and their impact on the outcomes of research.
Many researchers view the application of statistical tests
as a simple and clear-cut process. Nevertheless, in many sit-
uations the appropriate application of statistical tests may
be unclear, and in other situations, controversial. For exam-
ple, consider the independent samples t-test, which is one
of the most prevalent statistics used in medicine, psychol-
ogy, and education research [13–16]. The t-test is derived
under the assumption of normality and is therefore the uni-
formally most powerful unbiased test (UMPU) when data
are normally distributed. (UMPU means that by definition,
when the data are normally distributed, no other test has
greater ability to detect true differences for a given sample
size.) But, does it maintain its UMPU properties when vio-
lations from normality occur for small samples typical of ap-
plied research?
In medicine and social and behavioral science, normal
theory tests (e.g., t-test, ANOVA) have been used more
extensively than nonparametric statistics (which do not ap-
peal to the population shape as part of their derivation)
[13–15, 17–19]. Yet, statisticians and researchers question
the frequency of normally distributed data in real world
problems [17, 20–24]. For example, Micceri [17] examined
440 psychometric and ability measures and found that all of
the data sets were nonnormal according to the Kolomog-
orow-Smirnov test for normality at the 0.01 alpha level.
Only 3% were even remotely similar to the normal curve
(i.e., smooth symmetric with light tails). The problem of
non-normality in applied data is also prevalent in medical
research. For example, a literature search was conducted to
*Address for correspondence: Patrick D. Bridge, University Health Center,
4201 St. Antoine, Room 4J, Wayne State University, Detroit, MI 48201.
Accepted for publication on 6 November 1998.