Ecological Monographs 54(2), 1984,pp. 187-2 ~$3 1984 by the Ecological Society of America 11 PSEUDOREPLICATION AND THE DESIGN OF ECOLOGICAL FIELD EXPERIMENTS STUART H. HURLBERT Department of Biology, San Diego State University, San Diego, California 92182 USA Abstract. Pseudoreplication is defined. as the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated (though samples may be) or replicates are not statistically independent. In ANOVA terminology, it is the testing for treatment effects with an error term inappropriate to the hypothesis being considered. Scrutiny of 176 experi- mental studies published between 1960 and the present revealed that pseudoreplication occurred in 27% of them, or 48% of all such studies that applied inferential statistics. The incidence of pseudo- replication is especially high in studies of marine benthos and small mammals. The critical features of controlled experimentation are reviewed. Nondemonic intrusion is defined as the impingement of chance events on an experiment in progress. As a safeguard against both it and preexisting gradients, interspersion of treatments is argued to be an obligatory feature of good design. Especially in small experiments, adequate interspersion can sometimes be assured only by dispensing with strict random- ization procedures. Comprehension of this conflict between interspersion and randomization is aided by distinguishing pre-layout (or conventional) and layout-speci fit alpha (probability of type I error). Suggestions are offered to statisticians and editors of ecological j oumals as to how ecologists’ under- standing of experimental design and statistics might be improved. Key words: experimental design; chi-square; R. A. Fisher; W. S. Gossett; interspersion of treat- ments; nondemonic intrusion; randomization; replicability; type I error. No one would now dream of testing the response to a treat- ment by comparing two plots, one treated and the other un- treated. -R. A. Fisher and J. Wishart (1930) . . . field experiments in ecology [usually] either have no replication, or have so few replicates as to have very little sen- sitivity . . . -L. L. Eberhardt (1978) I don’t know how anyone can advocate an unpopular cause unless one is either irritating or ineffective. -Bertrand Russell (in Clark 1976:290) INTRODUCTION The following review is a critique of how ecologists are designing and analyzing their field experiments. It is also intended as an exploration of the fundamentals of experimental design. My approach will be: (1) to discuss some common ways in which experiments are misdesigned and statistics misapplied, (2) to cite a large number of studies exemplifying these problems, (3) to propose a few new terms for concepts now lacking convenient, specific labels, (4) to advocate treatment interspersion as an obligatory feature of good design, and (5) to suggest ways in which editors quickly can improve matters. Manuscript received 25 February 1983; revised 2 1 June 1983; accepted 25 June 1983. Most books on experimental design or statistics cov- er the fundamentals I am concerned with either not at all or only briefly, with few examples of misdesigned experiments, and few examples representing experi- mentation at the population, community or ecosystem levels of organization. The technical mathematical and mechanical aspects of the subject occupy the bulk of these books, which is proper, but which is also dis- tracting to those seeking only the basic principles. I omit all mathematical discussions here. The citing of particular studies is critical to the hoped- for effectiveness of this essay. To forego mention of specific negative examples would be to forego a pow- erful pedagogic technique. Past reviews have been too polite and even apologetic, as the following quotations illustrate: There is much room for improvement in field ex- perimentation. Rather than criticize particular in- stances, I will outline my views on the proper meth- ods . . . . (Connell 1974) In this review, the writer has generally refrained from criticizing the designs, or lack thereof, of the studies cited and the consequent statistical weakness of their conclusions; it is enough to say that the ma- jority of the studies are defective in these respects. (Hurlbert 1975) . . . as I write my comments, I seem to produce onZy a carping at details that is bound to have the totaZ effect of an ill-tempered scolding . . . . I hope those whose work I have referenced as examples will