Psychological Bulletin 1989, Vol. 106, No. I. 155-160 Copyright 1989 by the American Psychological Association, Inc. 0033-2909/89/S00.75 Significance Tests and the Duplicity of Binary Decisions Robert Folger A. B. Freeman School of Business, Tulane University Presents a logical justification for the following statements and discusses their implications: It is duplicitous (misleading) to use significance tests for making binary (either/or) decisions regarding the validity of a theory; the binary choice between calling results significant or not significant should not govern the confidence placed i n a theory, because such confidence cannot be gained in the either/ or fashion characterizing deductive certainty. The implications include grounds for describing ways that effect size estimates become useful in makingjudgments about the value of theories. Charles Sanders Peirce, who some say was America's greatest philosopher of science, wrote that "science can only rest on ex- perience"; he then added that "experience can never result in absolute certainty, exactitude, or universality" (Buchler, 1955, p. 47). Hypothesis-testing procedures make explicit science's lack of certainty. In particular, inferential statistics provide esti- mates of whether a finding reflects merely random variation, an outcome owing to "chance" rather than to causal mechanisms hypothesized by theories. I contend that such procedures can be used in a manner that would deserve the label of duplicity (i.e., an attempt to dupe the unsuspecting) if it were actually intentional. Inferential statistics can be especially misleading when carelessly applied logic suggests an unwarranted degree of deductive certainty. Chow's (1988, Table 2) syllogistic description of hypothesis testing (see my Table 1) illustrates how unwarranted conclu- sions can be reached by seemingly logical steps. Syllogisms are either valid or invalid, hence "logical validity is an all-or-none property of an argument" (Chow, 1988, p. 109). Chow thereby mirrors common practice by emphasizing the binary choice be- tween rejecting or not rejecting a theory. Researchers following common practice can dupe themselves and others, however, be- cause neither a theory's validity nor its invalidity is a matter of deductive certainty. Chow's (1988) idea that "the role of a statistical analysis. . . is to supply the investigator with the minor premise for the syl- logistic argument" (p. 108) also raises issues about the use of effect sizes: All that is required of a statistical analysis is a binary decision.. . . because the validity of the syllogistic argument requires only that information. Even if a quantitatively more informative index is available (e.g., effect size, the amount of variance accounted for, or the power of the test), it will still be used in a binary manner.. . . Nothing is gained by using an effect-size estimate.. . .(p. 108) Chow claimed that significance tests are superior to effect sizes I thank Bill Dunlap and Irving LaValle for comments. Correspondence concerning this article should be addressed to Rob- ert Folger, A. B. Freeman School of Business, Tulane University, New Orleans, Louisiana 70118. on the basis of the logic of syllogistic argument. I interpret the same syllogism differently, making the role of effect sizes once more an open issue. No Proof of Validity: Problems in Affirming Consequences The heading of the third column in Table 1 (Affirming conse- quent) is also the name of a logical fallacy, which should imme- diately raise suspicions about imposing either/or (binary) deci- sions when judging theories. The fallacy of affirming the conse- quent involves reasoning backward from the validity of conclusions to the presumed validity of their premises. Suppose someone claims that "if we follow the principles of supply-side economics, the economy will prosper" because "the election of a supply-side President caused economic prosperity." Among other interpretive problems (e.g., what does it mean to imple- ment supply-side policies; what measure reveals the degree to which the President had in fact done so; and how should eco- nomic prosperity be measured?), there is a clear danger of spuri- ous correlation: How can one be sure that prosperity was not caused by something else? It is also easy to see the danger of affirming the consequent from examples of fallacious reasoning such as the following: "If Hitler is still alive, the sun is hot." "The sun is hot." "Therefore Hitler is still alive." Chow argued for all-or-nothing, dichototnous choices, even while using the word probably twice in the third column of Ta- ble 1, yet the reference to probability rather than to certainty is demanded by the fallacious nature of consequence-affirming arguments: Because it is logically false to conclude with deduc- tive certainty that true premises are entailed by true conse- quences, then at best any evidence for the consequences' valid- ity provides only arguments consistent with the validity of the premises. It is more accurate to say that experimental results are consistent with a hypothesis than to say that they con- firm it. No Proof of Invalidity: Problems with Modus Tollens Table I also misrepresents modus tollens logic in the second column, where T, is false misleadingly implies that a theory is 155