1 Reliability of Measurement and Power of Significance Tests Based on Differences Donald W. Zimmerman, Carleton University Richard H. Williams, University of Miami Bruno D. Zumbo, University of Ottawa The power of significance tests based on differ- ence scores is indirectly influenced by the reliability of the measures from which differences are obtained. Reliability depends on the relative magnitude of true score and error score variance, but statistical power is a function of the absolute magnitude of these components. Explicit power calculations reaffirm the paradox put forward by Overall & Woodward (1975, 1976)—that significance tests of differences can be powerful even if the reliability of the difference scores is 0. This anomaly arises because power is a function of observed score variance but is not a function of reliability unless either true score variance or error score variance is constant. Provided that sample size, significance level, directionality, and the alternative hypothesis associated with a significance test remain the same, power always increases when population variance decreases, independently of reliability. Index terms: difference scores, error of measurement, power, significance tests, t test, test reliability, true scores. The relation between the reliability of differ- ence scores and the power of significance tests based on difference scores has been a trouble- some issue in psychometrics for more than a de- cade. Overall & Woodward (1975, 1976) showed that a paired-samples Student t test can be power- ful, even though the reliability of the difference scores from which the t statistic is calculated is 0. This extreme example engendered controversy, and several authors (e.g., Fleiss, 1976; Nicewander & Price, 1983; Zimmerman & Williams, 1986) ex- pressed opinions on the issue. APPLIED PSYCHOLOGICAL MEASUREMENT vol. 17, No. l, March 1993, pp. 1-9 @ Copynght 1993 Applied Psvchological Measurement Inc 0146-6216/93/010001-09$l. 70 Recently, this issue has again come into promi- nence in an interchange of views (Humphreys & Drasgow, 1989a, 1989b; Overall, 1989a, 1989b). The issue is investigated here using concepts of statistical power analysis developed by Cohen (1988, 1990), originally in 1969. Calculations are presented here that demonstrate how the statis- tical power associated with difference scores is influenced by reliability. The results of these calculations reemphasize the importance of the paradox noted by Overall & Woodward (1975, 1976). The present paper extends to difference scores some methods originally used to find relations between reliability and power associated with a single measurement (Williams & Zimmer- man, 1989; Zimmerman & Williams, 1986). Determinants of the Power of Significance Tests The power function of a significance test is determined by sample size, population variance, significance level, the alternative hypothesis, and directionality, as well as the use of information in sample data by the test statistic. The influence of population variance and the alternative hypothesis can be combined into a measure of effect size (Cohen, 1988). Therefore, the question &dquo;How does statistical power depend on reli- ability ?&dquo; can be rephrased to &dquo;Everything else being equal, how does power change as reliability changes?&dquo; The answer is that statistical power does not change at all under these conditions. If the variables just mentioned have fixed values, then the power of a significance test is completely Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227 . May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/