A framework for active vision-based robot control using neural networks Rajeev Sharma and Narayan Srinivasa The Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 N. Mathews Avenue, Urbana, IL 61801 (USA) SUMMARY Assembly robots that use an active camera system for visual feedback can achieve greater ﬂexibility, including the ability to operate in an uncertain and changing environment. Incorporating active vision into a robot control loop involves some inherent difﬁculties, including calibration, and the need for redeﬁning the servoing goal as the camera conﬁguration changes. In this paper, we propose a novel self-organizing neural network that learns a calibration-free spatial representation of 3D point targets in a manner that is invariant to changing camera conﬁgurations. This repre- sentation is used to develop a new framework for robot control with active vision. The salient feature of this framework is that it decouples active camera control from robot control. The feasibility of this approach is established with the help of computer simulations and experiments with the University of Illinois Active Vision System (UIAVS). KEYWORDS: Active vision; Visual servoing; Assembly; Learn- ing; Neural network. 1. INTRODUCTION Visual feedback has great potential in increasing the ﬂexibility of robotic assembly operations, for example, being able to operate in an imprecisely calibrated workcell and dealing with unexpected changes in the workcell. 1 The visual feedback is usually provided either by a set of stationary cameras or by a camera-in-hand setup where the camera is mounted on the assembly robot itself. However, this greatly limits the scope of the robotic tasks. For example, when using ﬁxed cameras, during a typical assembly operation, various portions of the workcell may go out of the ﬁeld of view of the camera, or be out of focus, etc. With a camera-in-hand setup the view of the workcell is limited by the task being executed and thus its usefulness may be restricted to tasks such as tracking. An alternative is to use active vision where a separately mounted motorized camera setup can be independently and dynamically reconﬁgured during the course of an assembly operation (see Figure 1). In the recent past, it has been shown that active vision can greatly improve the process of image interpretation and vision-based control. 2—5 Although signi- ﬁcant advances have been made in active vision research, much of its potential is still unrealized in robot control. 6 Incorporating visual feedback into classical robot control leads to the visual servo control problem. A recent survey of the different mechanisms of visual feedback involved in visual servo control can be found in Corke. 7 In particular, an important distinction made is that of the feedback repre- sentation mode, which can be either position-based or image-based (see Figure 2). Position-based servoing uses the visual image of the scene to ‘‘reconstruct’’ the surrounding 3D environment. The absolute positions of the objects gathered from this reconstruction are used for robot motion planning and control. The position-based approach thus involves an image interpretation step (e.g. depth from stereo) in the control loop which is difﬁcult to implement with an active camera. On the other hand, an image-based servoing process bypasses the 3D world reconstruction and uses images features directly to control robot motion. 8–12 Image-based servoing observes how differential changes in robot conﬁguration space relate to differential changes in image features space, and then uses this derived relationship and the expected goal features to control robot motion. The disadvantage of the image-based approach is that the control goal is hard to specify with changing camera conﬁgura- tions. Thus, there are many issues to be addressed before using an active camera for robot control. One major issue is that of calibration of different components of the robot/camera system. Another important issue is that of deﬁning the control goal as the camera conﬁguration changes. In this paper we address these issues and propose a framework for active vision based control that exploits unique properties of a 3D spatial representation learned by a neural network. This learning is achieved by a novel neural network which is easy to implement on a robotic active vision system and is capable of on-line learning. Once a mechanism for learning a spatial representation is available, a control scheme can be deﬁned in terms of this representation. An overview of the proposed control archi- tecture is given in Figure 3. The goal of a control task is speciﬁed in terms of the 3D representation of two camera views. The initial view corresponds to some features of the robot end-effector at its starting location and the ﬁnal view represents the same features in the goal conﬁguration. The difference between the computed spatial representation of features from the initial and ﬁnal views is then used as a control feedback signal to drive the controller. Since the representation (and hence the feedback) does not change with changes in camera conﬁguration, active camera control is decoupled from the robot control problem. Consider the assembly workcell shown in Figure 1. If the robot end- effector goes out of the camera’s ﬁeld-of-view during an assembly operation, the active camera system can be Robotica (1998) volume 16, pp. 309–327. Printed in the United Kingdom © 1998 Cambridge University Press