1 Segmentation and Modelling of Visually Symmetric Objects by Robot Actions Wai Ho Li and Lindsay Kleeman Intelligent Robotics Research Centre Department of Electrical and Computer Systems Engineering Monash University, Clayton, Victoria 3800, Australia { Wai.Ho.Li, Lindsay.Kleeman } @eng.monash.edu.au Abstract—Robots usually carry out object segmentation and modelling passively. Sensors such as cameras are actuated by a robot without disturbing objects in the scene. In this paper, we present an intelligent robotic system that physically moves objects in an active manner to perform segmentation and modelling using vision. By visually detecting bilateral symmetry, our robot is able to segment and model objects through controlled physical interactions. Extensive experiments show that our robot is able to accurately segment new objects autonomously. We also show that our robot is able leverage segmentation results to autonomously learn visual models of new objects by physically grasping and rotating them. Object recognition experiments conﬁrm that the robot-learned models allow robust recognition. Videos of robotic experiments are available from Multimedia Extensions 1, 2 and 3. Index Terms—fast symmetry, real time, computer vision, autonomous, segmentation, robotics, object recognition, SIFT, interactive learning, object manipulation, grasping I. I NTRODUCTION The ability to perform object segmentation and modelling used to be the exclusive domain of higher primates. With passing time, computer vision research has produced ever im- proving systems that can segment and model objects. Modern techniques such as Interactive Graph Cuts [Boykov and Jolly, 2001] and Geodesic Active Contours [Markus et al., 2008] can produce accurate segmentations given some human guidance. Similarly, visual features such as SIFT [Lowe, 2004], Gabor Filter banks [Mutch and Lowe, 2006] and Haar wavelets [Viola and Jones, 2001] enable reliable object detection and recogni- tion, especially when combined with machine learning meth- ods such as Boosting using AdaBoost [Freund and Schapire, 1997]. However, these computer vision techniques rely heavily on a priori knowledge of objects and their surroundings, such as initial guesses of foreground-background pixels, which is difﬁcult to obtain autonomously in real world situations. This paper presents a robotic system that applies physical actions to segment and model new objects using vision. The system is composed of a robot arm that moves objects within its workspace inside the ﬁeld of view of a stereo camera pair. The arm-camera geometry is conﬁgured to mimic a humanoid platform operating on objects supported by a ﬂat table. A photo of our robotic system is shown in Figure 1. The checkerboard pattern is used to perform a once-off arm- camera calibration prior to robotic experiments. Fig. 1. Robot System Components Physical actions can reduce the need for prior knowledge by providing foreground-background segmentation. However, a robot will require signiﬁcant training and background in- formation to perform object manipulations autonomously. By limiting our scope to objects that exhibit bilateral symmetry in a perpendicular manner to a known plane, such as cups and bottles resting on a table, we propose a partial but robust solution to this problem. Given that many objects in domestic and ofﬁce environments exhibit sufﬁcient bilateral symmetry for our autonomous system, our symmetry-based approach can be employed in a wide variety of situations. Experiments show that our robot is able to autonomously segment and model new symmetric objects through the use of controlled physical actions. Object recognition experiments conﬁrm that the robot- collected models allow robust recognition of learned objects. A. Object Segmentation We deﬁne object segmentation as the task of ﬁnding all pixels in an image that belong to an object in the physical world. An object is deﬁned as something that can be manip- ulated by our robot, such as a cup or bottle. Whereas image segmentation methods generally rely on consistency in adja- cent pixels [Pal and Pal, 1993], [Skarbek and Koschan, 1994],