Psychological Bulletin 1968, Vol. 69, No. 3, 161-182 MULTIPLE REGRESSION IN PSYCHOLOGICAL RESEARCH AND PRACTICE RICHARD B. DARLINGTON * Cornell University A number of common practices and beliefs concerning multiple regression are criticized, and several paradoxical properties of the method are emphasized. Major topics discussed are the basic formulas; suppressor variables; measures of the "importance" of a predictor variable; inferring relative regression weights from relative validities; estimates of the true validity of population regression equations and of regression equations developed in samples; and statistical criteria for selecting predictor variables. The major points are presented in outline form in a final summary. In recent years, electronic computers have made the multiple regression method readily available to psychologists and other scientists, while simultaneously making it unnecessary for them to study in full the cumbersome computational details of the method. There- fore, there is a need for a discussion of multiple regression which emphasizes some of the less obvious uses, limitations, and properties of the method. This article at- tempts to fill this need. It makes no attempt to cover thoroughly computational techniques or significance tests, both of which are dis- cussed in such standard sources as McNemar (1962), Hays (1963), DuBois (1957), and Williams (1959). The discussion of signifi- cance tests by Williams is especially com- plete, as is the presentation of computing directions by DuBois. The latter source also contains many basic formulas of consider- able interest. Anderson (1958) gives a very complete mathematical presentation of the exact sampling distributions of many of the statistics relevant to multiple regression. Elashoff and Afifi (1966) reviewed procedures applicable when some observations are missing. Beaton (1964) described a set of elegantly simple computer subroutines which a FORTRAN programmer can use to write quickly almost any standard or special-pur- pose regression program he may require. 1 For critical comments on preliminary drafts, the author is indebted to J. Millman, P. C. Smith, and T. A. Ryan, and to his students J. T. Barsis, W. Buckwalter, H. Day, B. Goldwater, and G. F. Stauffer. He is especially grateful to his student C. S. Otterbein, whose editorial and substantive contributions amounted nearly to coauthorship. Some of the points made herein are original, some have been derived independently by several workers in recent years, and some surprisingly little-known points were made in print 40 or more years ago. In general, the dependent or criterion variable will be denoted by X 0 , and the in- dependent or predictor variables by Xi, X t , • • • , X n . The score of person * on variable Xj is symbolized by x tj . The population multiple correlation is denoted by R, ordinary correla- tions by p, standard deviations by a. Popula- tion regression weights are denoted by 0, with ft' denoting the corresponding weights when all variables have been adjusted to standard score form. Sample values of these parameters are denoted by R, r, s, b, and b'. The purpose of the multiple regression method is to derive weights /3i, /8 2 , • • • , j8 n for the variables Xi, X s , • • •, X n , and an additive constant a, such that the resulting weighted composite X 0 , which is defined by the multiple regression equation + 182^2 + 183X3 -)-• a [1] predicts a specified criterion variable X 0 with a minimum sum of squared errors; thus X 0 cor- relates maximally with X 0 . This paper deals directly only with the linear additive model, in which X 0 is a linear function of the predictor variables. This restriction is more apparent than real, however, since if desired some of the variables in the equation can be curvilinear or configural (interactive) func- tions of other variables. 161