The importance of symmetry and virtual views in three- dimensional object recognition T. Vetter*, T. Poggio and H.H. Biilthoff* Center for Biological and Computational Learning & Artificial Intelligence Laboratory, Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. Background: Human observers can recognize three- dimensional objects seen in novel orientations, even when they have previously seen only a relatively small number of different views of the object. How our visual system does this is a key problem in vision research. Recent theories and experiments suggest that the human visual system might store a relatively small number of sample two-dimensional views of a three-dimensional object, and recognize novel views by a process of interpolation between the stored sample views. These sample views may be collected during a training phase as the visual system familiarizes itself with the object. Results: Here, we investigate whether constraints on the shapes of objects commonly encountered in the real world can reduce the number of training views required for recognition of three-dimensional objects. We are particularly concerned with the constraint of object symmetry. We show that if an object is bilaterally symmetrical, then additional 'virtual views' can automatically be generated from one sample view by symmetry transformations. These virtual views should make it more easy to recognize novel views of a symmetric than an asymmetric object, when a single sample view has been seen. Recognition should be particularly facilitated when the novel views are close to the virtual view. We present psychophysical results that bear out these predictions. Conclusions: Our results show that the human visual system can indeed exploit symmetry to facilitate object recognition, and support the model for object recognition in which a small number of two-dimen- sional views are remembered and combined to recognize novel views of the same object. These results raise questions about how symmetry is recog- nized, and symmetry transformations implemented, in real, biological neural networks. Current Biology 1994, 4:1 8-23 Background The two-dimensional image formed by a three-dimen- sional object changes with viewpoint. This creates a problem for any visual system, artificial or natural, which must recognize a three-dimensional object from a previously unseen view. Theoretical results show that if a full, three-dimensional model of the object is available, novel views can be recognized by registering and comparing them with two-dimensional projections of the three-dimensional model, provided the corre- spondence between object feature points in the novel view and model projection is known. Alternatively, the theory also shows that a small number of stored two- dimensional model views may be sufficient for recognition of novel views. For instance, under the assumption of orthographic projection (a parallel pro- jection in which the direction of the projection and the normal of the projection plane coincide) and in the absence of self-occlusions, the theoretical lower limit for the number of necessary views for recognition is two (the '1.5 views theorem' [1,2]). For these particu- lar results to hold, a view must be defined as a 2N vector (x 1 , yl, x 2 ,2 ,........ x ya) of the coordinates in the image plane of N labeled and visible feature points on the object. All features are assumed to be visible, as they are in wire-frame objects (Figs 1 and 2). Psychophysical experiments [3,4], using wire-frame and other objects, suggest that a relatively small number (but significantly more than two - around twenty) of views are used by the human visual system, which seems capable of generalizing to novel views by 'inter- polating' between a few model views. These experiments do not agree with the optimal theoretical bounds described above, but are instead consistent with a network model, based on the theory of Radial Basis Functions (RBF), proposed by Poggio and Edelmann [5]. In this model, each hidden unit is consid- ered to be similar to a view-centered neuron tuned to one of the example views, or to prototypical views found by the network during the learning stage, whereas the output can be view-independent if enough training views are provided. In this model, a view may consist of feature values more general than the x, y coordinates of distinctive feature points in the image, a possibility that seems more plausible from the biological point of view. Results Theoretical results The key problem in all schemes for learning from examples, such as RBF networks and various types of neural networks, is the number of required examples for a given task. Often an insufficient number of examples are available or obtainable. A case in point is the recognition of a three-dimensional object such as a face from a single training or model view. An attractive © Current Biology 1994, Vol 4 No 1 *Present address: Max Planck Institut fr biologische Kybernetik, 72076 Tubingen, Germany. Correspondence to: T. Vetter. 18