~~~chologica~ Reports, 1977,40,931-935. @ Psychological Reports 1977 UNRELIABILITY OF DIFFERENCE SCORES : A CLARIFICATION OF THE ISSUES1 STEPHEN LEEB3 AND SHARON MEINBERG hlmu York University Srimrnary.-This paper is an attempt to clarify several recent issues chat have been raised concerning the unreliability of difference scores. Articles by Overall and Woodward (1975, 1976) and Fleiss (1976) are considered. The conclusions of these articles are shown to be incomplete and to some extent misleading. W e begin by noting that within the context of a repeated-measures design, when the reliability of difference scores is other than zero, there exists an interaction of subject by treatment. This interaction will imply the existence of at least one additional systematic variable which contributes to the experi- mental results. We argue that without specific knowledge about the mean and variance of this systematic variable the interpretation of mean changes across pretest and posttest is ambiguous. With complete knowledge of all variables affecting the experimental situation, the power of the t test on difference scores is directly related to both the individual reliabilities of the separate pretest and posttests and the reliability of the difference scores as long as interaction of subject by treatment is constant across experiments. If, on the other hand, that interaction is not constant across experiments, then the power of the test on dif- ference scores is inversely related to the reliability of the difference scores as long as the individual reliabilities of the pretests and posttests are constant across experiments. Since it is much more reasonable to make the assumption that the interaction is constant across experiments rather than individual test reliabilities, it is more reasonable to conclude that, contrary to Overall and Woodward, power of the r test for difference scores is directly related to the reliability of the dif- ference scores. Recently, Overall and Woodward (1975) claimed thac maximal power for a t test, computed on difference scores, is attained when the reliability of the dif- ference scores is zero. It has been shown by Fleiss ( 1976) that this claim is spurious; that, in fact, when a more general model is employed, one which in- cludes an interaction of subject by treatment, "the maximal power for the t cest is attained under the same circumstances as maximal reliability." While we agree in part with Fleiss' conclusion, we demonstrate that it is not as straightfor- ward as it appears. A clearer picture of the issues relating to,mean difference scores and the associated reliability of difference scores is presented here. Suppose that N subjects are each measured before and after an experimental manipulation and that interest is in assessing the mean difference between the two occasions. W e start by noting thac the reliability of a difference score can be ex- 'Both authors share equally in this contribution; the names appear alphabetically. 'Also at Indicator Digest, 451 Grand Avenue, Palisades Park, New Jersey 07650 and Manufacturers Hanover Trust Company, New York, New York 10022.