SEGMENTED SHAPE DESCRIPTIONS FROM 3-VIEW STEREO* zy Parag Havaldar and Gtrard Medioni Institute of Robotics and Intelligent Systems University of Southern California Los Angeles, California 90089-0273 havaldar@iris.usc.edu, medioni@iris.usc.edu Abstract: zyxwvutsrqpo We address the recovery of segmented, 3-0 de- scriptions of an object from intensity images. We use three views of an objectfrom slightly different viewpoints as our in- put. For each image we extract a hierarchy of groups based on proximity, parallelism and symmetry in a robust manner. The groups in the three images are matched by computing the epipolar geometry. For each set of matched groupsfrom the three images, we then label the contours of the groups as “true” or “limb” edges. Using the information about groups, the label associated with their contours and projective prop- erties of subclasses of Generalized Cylinders, we infer the 3- D structure of these groups. The proposed method not only al- lows robust shape recovery but also produces segmented parts. Our approach can also deal with groups generated as a result of texture or zyxwvutsrq shadows on the object. Wepresent results on real images of moderately complex objects. Key Words: Shape description, Grouping, Stereo, Edge La- beling, Generalized Cylinders. 1 Introduction Inference and description of the 3-D shape of objects from one or very few intensity images is an importantand cer- tainly a challenging problem in computer vision. Such de- scriptions provide compact representations which can be used to recognize objects, manipulate them, navigate around them and learn new objects. In this paper we address zyxwvut the problem of recovering the shape of an object from zyxwvut three intensity views taken from close viewpoints, and derive volumetric, segmented (or part-based) descriptions in the presence of noise, texture, shadows. We propose to obtain such descrip- tions by: Grouping: In each image we extract groups by detecting parallel and skew symmetries. These groups are extracted hierarchically to enable robust performance in the presence of noise. Matching: The groups detected in each image are matched. These correspondences are established by using the epipolar geometry between all pairs of images. Labelling: The contours of matched groups in the three images are labelled as “true” or “limb” edges. This label- ling provides information about the shape of the underlying zyxwvu * This research was zyxwvutsrqpo supported in part by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Air Force zyxwvut office of ScientificResearch underGrant No F4%20-93-1-0620. The Unit- ed States Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation hereon. surface. 3-0 Inference: The groups whose contours are labelled as limb are hypothesized to come from smooth GCs, while the groups whose contours are labelled as true edges are hypoth- esized as coming from surfaces. We start by motivating our approach in the light of exist- ing theories of object recognition, and review relevant work. 2 Motivation and Previous Work Humans seem to understand the shape of objects from in- tensity images with little effort, even if the objects are occlud- ed, novel, rotated in depth or extensively degraded. However, computer vision has made only a small progress in this direc- tion. Here, we address a few issues which have motivated our work. Do we need to infer 3-0 descriptions? There are two schools of thought on human perception, one which postulates the storage of specific viewpoints [51 and the other which states the use of viewpoint invariant vol- umetric primitives [ 11. We do not see these theories as being contradictory. Tasks such as recognition of an object in the scene require rich and stable descriptions. Volumetric de- scriptions provide such rich descriptions. Manipulation tasks require an understanding of the object’s pose. Predefined pos- es of the object may be used to help this process. What kinds of volumetric descriptions? To address the complexity of objects, we need to repre- sent them in an manner that makes their interpretation tracta- ble. Representationsof objects in terms of parts reduces such complexity. In the psychology field, Biederman [ 11 has pro- posed that part-based volumetric representations based on geons, appear to be psychologically plausible. Similar ideas based on Generalized Cylinders (henceforth GCs) have been proposed by Nevatia and Binford [ 131 in computer vision. We hypothesize that part based volumetric descriptions based on subsets of Generalized Cylinders (GCs) constitute a suitable level of shape abstraction of large classes of complex 3-D ob- jects. Specifically, the subclass of GCs used are those that give rise to perceptual properties such as symmetries and clo- sures in the image. As a result of this representation scheme, however, objects which can be better described by statistical features, for example waves, bushes, cannot be dealt with. How to infer these segmented 3-0 descriptions? Deriving descriptions from real images, in a data driven- fashion, poses many problems. Among them are discontinu- ous boundaries, the presence of texture and shadows, occlu- 102 0-8186-7042-8/95 $4.00 zyxwvutsrqp 0 1995 IEEE Authorized licensed use limited to: University of Southern California. Downloaded on November 16, 2009 at 14:19 from IEEE Xplore. Restrictions apply.