A framework for active vision-based robot control using neural
networks
Rajeev Sharma and Narayan Srinivasa
The Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign,
405 N. Mathews Avenue, Urbana, IL 61801 (USA)
SUMMARY
Assembly robots that use an active camera system for visual
feedback can achieve greater flexibility, including the ability
to operate in an uncertain and changing environment.
Incorporating active vision into a robot control loop
involves some inherent difficulties, including calibration,
and the need for redefining the servoing goal as the camera
configuration changes. In this paper, we propose a novel
self-organizing neural network that learns a calibration-free
spatial representation of 3D point targets in a manner that is
invariant to changing camera configurations. This repre-
sentation is used to develop a new framework for robot
control with active vision. The salient feature of this
framework is that it decouples active camera control from
robot control. The feasibility of this approach is established
with the help of computer simulations and experiments with
the University of Illinois Active Vision System (UIAVS).
KEYWORDS: Active vision; Visual servoing; Assembly; Learn-
ing; Neural network.
1. INTRODUCTION
Visual feedback has great potential in increasing the
flexibility of robotic assembly operations, for example,
being able to operate in an imprecisely calibrated workcell
and dealing with unexpected changes in the workcell.
1
The
visual feedback is usually provided either by a set of
stationary cameras or by a camera-in-hand setup where the
camera is mounted on the assembly robot itself. However,
this greatly limits the scope of the robotic tasks. For
example, when using fixed cameras, during a typical
assembly operation, various portions of the workcell may go
out of the field of view of the camera, or be out of focus, etc.
With a camera-in-hand setup the view of the workcell is
limited by the task being executed and thus its usefulness
may be restricted to tasks such as tracking. An alternative is
to use active vision where a separately mounted motorized
camera setup can be independently and dynamically
reconfigured during the course of an assembly operation
(see Figure 1). In the recent past, it has been shown that
active vision can greatly improve the process of image
interpretation and vision-based control.
2—5
Although signi-
ficant advances have been made in active vision research,
much of its potential is still unrealized in robot control.
6
Incorporating visual feedback into classical robot control
leads to the visual servo control problem. A recent survey of
the different mechanisms of visual feedback involved in
visual servo control can be found in Corke.
7
In particular, an
important distinction made is that of the feedback repre-
sentation mode, which can be either position-based or
image-based (see Figure 2). Position-based servoing uses
the visual image of the scene to ‘‘reconstruct’’ the
surrounding 3D environment. The absolute positions of the
objects gathered from this reconstruction are used for robot
motion planning and control. The position-based approach
thus involves an image interpretation step (e.g. depth from
stereo) in the control loop which is difficult to implement
with an active camera. On the other hand, an image-based
servoing process bypasses the 3D world reconstruction and
uses images features directly to control robot motion.
8–12
Image-based servoing observes how differential changes in
robot configuration space relate to differential changes in
image features space, and then uses this derived relationship
and the expected goal features to control robot motion. The
disadvantage of the image-based approach is that the control
goal is hard to specify with changing camera configura-
tions.
Thus, there are many issues to be addressed before using
an active camera for robot control. One major issue is that
of calibration of different components of the robot/camera
system. Another important issue is that of defining the
control goal as the camera configuration changes. In this
paper we address these issues and propose a framework for
active vision based control that exploits unique properties of
a 3D spatial representation learned by a neural network.
This learning is achieved by a novel neural network which
is easy to implement on a robotic active vision system and
is capable of on-line learning.
Once a mechanism for learning a spatial representation is
available, a control scheme can be defined in terms of this
representation. An overview of the proposed control archi-
tecture is given in Figure 3. The goal of a control task is
specified in terms of the 3D representation of two camera
views. The initial view corresponds to some features of the
robot end-effector at its starting location and the final view
represents the same features in the goal configuration. The
difference between the computed spatial representation of
features from the initial and final views is then used as a
control feedback signal to drive the controller. Since the
representation (and hence the feedback) does not change
with changes in camera configuration, active camera control
is decoupled from the robot control problem. Consider the
assembly workcell shown in Figure 1. If the robot end-
effector goes out of the camera’s field-of-view during an
assembly operation, the active camera system can be
Robotica (1998) volume 16, pp. 309–327. Printed in the United Kingdom © 1998 Cambridge University Press