J Clin Epidemiol Vol. 52, No. 3, pp. 229–235, 1999 Copyright © 1999 Elsevier Science Inc. All rights reserved. 0895-4356/99/ $–see front matter PII S0895-4356(98)00168-1 Increasing Physicians’ Awareness of the Impact of Statistics on Research Outcomes: Comparative Power of the t-test and Wilcoxon Rank-Sum Test in Small Samples Applied Research Patrick D. Bridge 1, * and Shlomo S. Sawilowsky 2 1 Department of Family Medicine, Wayne State University School of Medicine and 2 Department of Theoretical and Behavioral Foundations, College of Education, Wayne State University, Detroit, Michigan ABSTRACT. To effectively evaluate medical literature, practicing physicians and medical researchers must understand the impact of statistical tests on research outcomes. Applying inefficient statistics not only increases the need for resources, but more importantly increases the probability of committing a Type I or Type II error. The t-test is one of the most prevalent tests used in the medical field and is the uniformally most powerful unbiased test (UMPU) under normal curve theory. But does it maintain its UMPU properties when assumptions of normality are violated? A Monte Carlo investigation evaluates the comparative power of the independent samples t-test and its nonparametric counterpart, the Wilcoxon Rank-Sum (WRS) test, to violations from population normality, using three commonly occurring distributions and small sample sizes. The t-test was more powerful under relatively symmetric distributions, although the magnitude of the differences was moderate. Under distributions with extreme skews, the WRS held large power advantages. When distributions consist of heavier tails or extreme skews, the WRS should be the test of choice. In turn, when population characteristics are unknown, the WRS is recommended, based on the magnitude of these power differences in extreme skews, and the modest variation in symmetric distributions. J CLIN EPIDEMIOL 52;3:229–235, 1999. © 1999 Elsevier Science Inc. KEY WORDS. Research methods; t-test; Wilcoxon Rank-Sum test; nonparametric statistics; parametric statistics; power INTRODUCTION The use of statistics in medical research has increased con- siderably in past 60 years, and the types of statistics used have become much more complex [1]. Concomitant with this influx in the use of statistics, unfortunately, was an in- crease in statistical errors [2–12]. For a practicing physician trying to stay current with the literature, or a medical re- searcher adding new knowledge to the field, it is important to understand the application and efficiency of statistical tests and their impact on the outcomes of research. Many researchers view the application of statistical tests as a simple and clear-cut process. Nevertheless, in many sit- uations the appropriate application of statistical tests may be unclear, and in other situations, controversial. For exam- ple, consider the independent samples t-test, which is one of the most prevalent statistics used in medicine, psychol- ogy, and education research [13–16]. The t-test is derived under the assumption of normality and is therefore the uni- formally most powerful unbiased test (UMPU) when data are normally distributed. (UMPU means that by definition, when the data are normally distributed, no other test has greater ability to detect true differences for a given sample size.) But, does it maintain its UMPU properties when vio- lations from normality occur for small samples typical of ap- plied research? In medicine and social and behavioral science, normal theory tests (e.g., t-test, ANOVA) have been used more extensively than nonparametric statistics (which do not ap- peal to the population shape as part of their derivation) [13–15, 17–19]. Yet, statisticians and researchers question the frequency of normally distributed data in real world problems [17, 20–24]. For example, Micceri [17] examined 440 psychometric and ability measures and found that all of the data sets were nonnormal according to the Kolomog- orow-Smirnov test for normality at the 0.01 alpha level. Only 3% were even remotely similar to the normal curve (i.e., smooth symmetric with light tails). The problem of non-normality in applied data is also prevalent in medical research. For example, a literature search was conducted to *Address for correspondence: Patrick D. Bridge, University Health Center, 4201 St. Antoine, Room 4J, Wayne State University, Detroit, MI 48201. Accepted for publication on 6 November 1998.