Journal of Experimental Psychology: Human Perception and Performance 1995, Vol. 21, No. 6, 1506-1514 Copyright 1995 by the American Psychological Association, Inc. 0096-I523/95/S3.1X) Viewpoint-Dependent Mechanisms in Visual Object Recognition: Reply to Tarr and Biilthoff (1995) Irving Biederman University of Southern California Peter C. Gerhardstein Rutgers University I. Biederman and P. C. Gerhardstein (1993) demonstrated that a representation specifying a distinctive arrangement of viewpoint-invariant parts (a geon structural description, [GSD]) dramatically reduced the costs of rotation in depth. M. J. Tarr and H. H. Bulthoff (1995) attempt to make a case for viewpoint-dependent mechanisms, such as mental rotation. Their suggestion that GSDs enjoy no special status in reducing the effects of depth rotation is contradicted by a wealth of direct experimental evidence as well as an inadvertent experiment that found no evidence for the spontaneous employment of mental rotation. Their complaint against geon theory's account of entry-level classification rests on a mistaken and unwar- ranted attribution that geon theory assumes a one-to-one correspondence between GSDs and entry-level names. GSDs provide a representation that distinguishes most entry- and subor- dinate-level classes and explains why complex objects are described as an arrangement of viewpoint-invariant parts. Consider the nonsense object in Figure 1. When first viewed, how did the reader know that the object was one never encountered previously? Why was the reader fairly confident that he or she would know what the object would look like if rotated 30°? The large central block would still look like a block and the vertical cylinder and wedge on top of the block would still be on top of the block. The zigzag cross brace connecting the tilting cylinder (ending in a cone) to the wedge would still enjoy the same relation if rotated 30°. These words denoting parts and relations are easily matched to the corresponding regions of the image. Geon theory (Biederman, 1987; Hummel & Biederman, 1992) seeks to account for these readily evident capacities and characteristics of human object recognition by positing that objects are represented as an arrangement of simple viewpoint-invariant parts (geons) and relations, termed a geon structural description (GSD). The resultant viewpoint- invariant representation is designed to account for many of the entry-level shape-based classifications, such as distin- guishing between a chair, an elephant, and a frying pan. The theory also provides an account of the vast majority of Irving Biederman, Department of Psychology, University of Southern California; Peter C. Gerhardstein, Department of Psy- chology, Rutgers University. This research was supported by a grant from the U.S. Army Night Vision and Electronic Sensors Directorate Army Research Office DAAH04-94-G-0065. We express our appreciation to Eric E. Cooper, Moshe Bar, and Jozsef Fiser for their helpful comments on this article. Correspondence concerning this article should be addressed to Irving Biederman, Department of Psychology, Hedco Neuro- science Building, MC 2520, University of Southern California, Los Angeles, California 90089-2520. Electronic mail may be sent via Internet to ib@rana.usc.edu. subordinate-level classifications that people readily make when they distinguish, for example, a square table from a round one or a four-legged chair from a swivel or a rocking chair. Biederman and Gerhardstein (1993) presented evidence showing that when objects differ in their GSDs, recognition from a novel viewpoint can be readily achieved, as long as the novel viewpoint would activate the same GSD (i.e., as long as the same viewpoint-invariant parts and relations among these parts were apparent in the image). If a set of stimuli could not be distinguished by their GSDs, as would occur with a set of bent paper clips differing only in the angles between their segments (such as those used by Bulthoff & Edelman, 1992, and Edelman & Bulthoff, 1992) or a set of blocks all at right angles to each other and differing primarily in length and attachment direction (as in Tarr's objects, 1989), strong viewpoint dependency would be expected. To the extent that rotation in depth partially changed the GSD, as would occur, for example, if some of the parts were occluded and others revealed, weaker acti- vation of the original unit representing the object would occur and consequently some cost in recognition perfor- mance would be expected (Biederman, 1987; Hummel & Biederman, 1992). Tarr and Bulthoff (1995) take issue with Biederman and Gerhardstein's (1993) position, preferring an account that assigns a central role to viewpoint-dependent mechanisms. According to this account, participants learn a representa- tion of the image projected by an object when viewed at its particular orientation. If an object is viewed from a new orientation, a mechanism (e.g., mental rotation, interpola- tion, extrapolation) is used that incurs a cost proportional to the angular disparity between original and tested views. Unfortunately, Tarr and Bulthoff (1995) do not commit 1506