The importance of symmetry and virtual views in three-
dimensional object recognition
T. Vetter*, T. Poggio and H.H. Biilthoff*
Center for Biological and Computational Learning & Artificial Intelligence Laboratory, Department of Brain and Cognitive Science,
Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
Background: Human observers can recognize three-
dimensional objects seen in novel orientations, even
when they have previously seen only a relatively
small number of different views of the object. How
our visual system does this is a key problem in vision
research. Recent theories and experiments suggest
that the human visual system might store a relatively
small number of sample two-dimensional views of
a three-dimensional object, and recognize novel
views by a process of interpolation between the
stored sample views. These sample views may be
collected during a training phase as the visual system
familiarizes itself with the object.
Results: Here, we investigate whether constraints on
the shapes of objects commonly encountered in the
real world can reduce the number of training views
required for recognition of three-dimensional objects.
We are particularly concerned with the constraint
of object symmetry. We show that if an object is
bilaterally symmetrical, then additional 'virtual views'
can automatically be generated from one sample view
by symmetry transformations. These virtual views
should make it more easy to recognize novel views of
a symmetric than an asymmetric object, when a single
sample view has been seen. Recognition should be
particularly facilitated when the novel views are close
to the virtual view. We present psychophysical results
that bear out these predictions.
Conclusions: Our results show that the human visual
system can indeed exploit symmetry to facilitate
object recognition, and support the model for object
recognition in which a small number of two-dimen-
sional views are remembered and combined to
recognize novel views of the same object. These
results raise questions about how symmetry is recog-
nized, and symmetry transformations implemented, in
real, biological neural networks.
Current Biology 1994, 4:1 8-23
Background
The two-dimensional image formed by a three-dimen-
sional object changes with viewpoint. This creates a
problem for any visual system, artificial or natural,
which must recognize a three-dimensional object from
a previously unseen view. Theoretical results show that
if a full, three-dimensional model of the object is
available, novel views can be recognized by registering
and comparing them with two-dimensional projections
of the three-dimensional model, provided the corre-
spondence between object feature points in the novel
view and model projection is known. Alternatively, the
theory also shows that a small number of stored two-
dimensional model views may be sufficient for
recognition of novel views. For instance, under the
assumption of orthographic projection (a parallel pro-
jection in which the direction of the projection and the
normal of the projection plane coincide) and in the
absence of self-occlusions, the theoretical lower limit
for the number of necessary views for recognition is
two (the '1.5 views theorem' [1,2]). For these particu-
lar results to hold, a view must be defined as a 2N
vector (x
1
, yl, x
2
,2 ,........ x ya) of the coordinates in
the image plane of N labeled and visible feature points
on the object. All features are assumed to be visible, as
they are in wire-frame objects (Figs 1 and 2).
Psychophysical experiments [3,4], using wire-frame and
other objects, suggest that a relatively small number
(but significantly more than two - around twenty) of
views are used by the human visual system, which
seems capable of generalizing to novel views by 'inter-
polating' between a few model views. These
experiments do not agree with the optimal theoretical
bounds described above, but are instead consistent
with a network model, based on the theory of Radial
Basis Functions (RBF), proposed by Poggio and
Edelmann [5]. In this model, each hidden unit is consid-
ered to be similar to a view-centered neuron tuned to
one of the example views, or to prototypical views
found by the network during the learning stage,
whereas the output can be view-independent if enough
training views are provided. In this model, a view
may consist of feature values more general than the
x, y coordinates of distinctive feature points in the
image, a possibility that seems more plausible from the
biological point of view.
Results
Theoretical results
The key problem in all schemes for learning from
examples, such as RBF networks and various types of
neural networks, is the number of required examples
for a given task. Often an insufficient number of
examples are available or obtainable. A case in point is
the recognition of a three-dimensional object such as a
face from a single training or model view. An attractive
© Current Biology 1994, Vol 4 No 1
*Present address: Max Planck Institut fr biologische Kybernetik, 72076 Tubingen, Germany. Correspondence to: T. Vetter.
18