Psychological Review 1917, Vol. M, No. 2, 115-147 Copyright 1987 by the American Psychological Association, Inc. 0033-295X/87/$00.75 Recognition-by-Components: A Theory of Human Image Understanding Irving Biederman State University of New York at Buffalo The perceptual recognition of objects is conceptualized to be a process in which the image of the input is segmented at regions of deep concavity into an arrangement of simple geometric compo- nents, such as blocks, cylinders, wedges, and cones. The fundamental assumption of the proposed theory, recognition-by-components (RBC), is that a modest set of generalized-cone components, called geons (N ^ 36), can be derived from contrasts of five readily detectable properties of edges in a two-dimensional image: curvature, collinearity, symmetry, parallelism, and cotermmation. The detection of these properties is generally invariant over viewing position and image quality and conse- quently allows robust object perception when the image is projected from a novel viewpoint or is degraded. RBC thus provides a principled account of the heretofore undecided relation between the classic principles of perceptual organization and pattern recognition: The constraints toward regularization (Pragnanz) characterize not the complete object but the object's components. Repre- sentational power derives from an allowance of free combinations of the geons. A Principle of Com- ponential Recoverycan account for the major phenomena of object recognition: If an arrangement of two or three geons can be recovered from the input, objects can be quickly recognized even when they are occluded, novel, rotated in depth, or extensively degraded. The results from experiments on the perception of briefly presented pictures by human observers provide empirical support for the theory. Any single object can project an infinity of image configura- tions to the retina. The orientation of the object to the viewer can vary continuously, each giving rise to a different two-dimen- sional projection. The object can be occluded by other objects or texture fields, as when viewed behind foliage. The object need not be presented as a full-colored textured image but in- stead can be a simplified line drawing. Moreover, the object can even be missing some of its parts or be a novel exemplar of its particular category. But it is only with rare exceptions that an image fails to be rapidly and readily classified, either as an in- stance of a familiar object category or as an instance that cannot be so classified (itself a form of classification). A Do-It-\burself Example Consider the object shown in Figure 1. We readily recognize it as one of those objects that cannot be classified into a familiar category. Despite its overall unfamiliarity, there is near unanim- ity in its descriptions. We parse—or segment—its parts at re- gions of deep concavity and describe those parts with common, simple volumetric terms, such as "a block," "a cylinder" "a funnel or truncated cone." We can look at the zig-zag horizontal brace as a texture region or zoom in and interpret it as a series of connected blocks. The same is true of the mass at the lower left: we can see it as a texture area or zoom in and parse it into its various bumps. Although we know that it is not a familiar object, after a while we can say what it resembles: "A New York City hot dog cart, with the large block being the central food storage and cooking area, the rounded part underneath as a wheel, the large arc on the right as a handle, the funnel as an orange juice squeezer and the various vertical pipes as vents or umbrella supports." It is not a good cart, but we can see how it might be related to one. It is like a 10-letter word with 4 wrong letters. We readily conduct the same process for any object, familiar or unfamiliar, in our foveal field of view. The manner of segmen- tation and analysis into components does not appear to depend on our familiarity with the particular object being identified. The naive realism that emerges in descriptions of nonsense objects may be reflecting the workings of a representational sys- tem by which objects are identified. This research was supported by the Air Force Office of Scientific Re- search Grants F4962083C0086 and 86-0106. I would like to express my deep appreciation to Thomas W. Blickle and Ginny Ju for their invaluable contributions to all phases of the em- pirical research described in this article. Thanks are also due to Eliza- beth A. Beiring, John Clapper, and Mary Lloyd for their assistance in the conduct of the experimental research. Discussions I had with James R. Pomerantz, John Artim, and Brian Fisher helped improve aspects of the manuscript Correspondence concerning this article should be addressed to Irving Biederman, Department of Psychology, State University of New York at Buffalo. Park Hall, Amhersl, New York 14260. An Analogy Between Speech and Object Perception As will be argued in a later section, the number of categories into which we can classify objects rivals the number of words that can be readily identified when listening to speech. Lexical access during speech perception can be successfully modeled as a process mediated by the identification of individual primitive elements, the phonemes, from a relatively small set of primi- tives (Marslen-Wilson, 1980). We only need about 44 phonemes to code all the words in English. 15 in Hawaiian, 55 to represent virtually all the words in all the languages spoken around the world. Because the set of primitives is so small and each pho 115