Journal of Educational Psychology 1976, Vol. 68, No. 6, 742-753 Experimental Validation of Two Classroom Observation Systems J. A. Wilson, B. J. Spelman, and Karen J. Trew Northern Ireland Council for Educational Research Research Unit, Queen's University of Belfast This experiment studied the theoretical structure of two classroom observa- tion systems and the extent to which the systems discriminated between experimentally ordered aspects of classroom situations. Both systems, Brown's Teacher Practices Observation Record and Denny's Classroom Crea- tivity Observation Schedule, were used by trained student teacher observers in 36 classrooms in nine Northern Ireland primary schools. Results indicate that the factor structure of the observation schedule was in close agreement with the two dimensions postulated by Denny, whereas the observation record sampled several dimensions of teacher control and classroom organiza- tion. Both systems discriminated consistently between teachers, schools, curriculum areas, and curriculum content. A dimension of convergency- divergency, which distinguished math lessons, as convergent, from English lessons, as divergent, was common to both systems. Emmer and Peck (1973) have argued that a proliferation of systems of system- atic classroom observation can only lead to taxonomic confusion unless more attention is paid to the internal dimensionality of categories and relationships between cate- gory systems. They point out that where descriptive categories are formed solely on a logical or theoretical basis, the position is analogous to that of the test developer who arbitrarily defines his subscales from sets of items without regard to internal consistency or relationship with other tests. We are pleased to acknowledge our indebtedness to D. Cameron and D. McNeill, Stranmillis College of Education, P. McConnellogue and J. McEvoy, St. Joseph's College of Education, and F. Magee, St. Mary's College of Education, for helping to guide the study through its early stages; J. McCann, St. Jo- seph's, for making the training videotapes; J. J. Campbell, Director of the Queen's University Insti- tute of Education, and Daphne Abraham, Organiser of the Queen's University Teachers' Centre, for making available accommodation for training ob- servers; the 12 principals and 48 teachers for making their schools and classrooms available, and finally the 24 student teachers who sustained an intensive 8-week program of training and observation without loss of data. Requests for reprints should be sent to J. A. Wilson, NICER Research Unit, Queen's University of Belfast, Belfast, Northern Ireland BT9 5BS. In their discussion of the reliability is- sue in classroom observation research, Medley and Mitzel (1963) underlined the danger of emphasizing interobserver agreement to the neglect of the stability of observed behavior over occasions. McGaw, Wardrop, and Bunda (1972), however, have pointed out that context or situa- tional variables have been neglected or mistreated in the design and analysis of observational studies and suggest that variance in classroom behavior from situa- tion to situation may be more ordered than random, with stability to be expected only over separate occasions within each situa- tion. Frick and Semmel (Note 1), who noted that such situations had not been defined by McGaw et al., suggest that they might include such context variables as subject matter, class size, seating arrange- ments, group structure, nature of teacher and pupil task, time of day, and so on. They agree with McGaw et al. that, in keeping with the purposes of the investi- gation, several coefficients of reliability, or generalizability, can be derived, each esti- mating the extent to which elements of a particular facet or situation in the design can be consistently discriminated from each other. The present study was designed to ex- 742