Eye-in-Hand/Eye-to-Hand Multi-Camera Visual Servoing Vincenzo Lippiello, Bruno Siciliano, Luigi Villani Abstract— A position-based visual servoing algorithm using an hybrid eye-in-hand/eye-to-hand multi-camera conﬁguration is presented in this paper. Based on an extended Kalman ﬁlter, this approach exploits the data provided by all the cameras without “a priori” discrimination, allowing real-time object pose estimation. A suitable algorithm is in charge of selecting an optimal subset of image features on the basis of the desired task and of the current conﬁguration of the workspace. Only this subset is considered for feature extraction, thus ensuring a computational cost independent of the number of cameras. Experimental results are reported to demonstrate the feasibility and the effectiveness of the proposed technique. I. I NTRODUCTION The adoption of visual feedback for closed-loop control of robot manipulators is becoming a common practice both in research and in industrial areas. This approach is known as visual servoing. Moreover, the increase in the perfor- mance/cost ratio of machine vision is opening new scenarios where multi-camera systems are employed (see [1] and [2]). The two most adopted camera conﬁgurations are known as eye-in-hand, where one or more cameras are rigidly attached to the robot end effector, and eye-to-hand, where the cameras are ﬁxed in the workspace [3]. The ﬁrst one guarantees good accuracy and the ability to explore the workspace although with a limited sight; the second one ensures a panoramic sight of the workspace, but a lower accuracy. Hence, the use of both conﬁgurations at the same time makes the execution of complex tasks easier and offers higher ﬂexibility in the presence of a dynamic scenario. Recently, some effort has been made to design visual servoing systems based on hybrid eye-in-hand/eye-to-hand camera conﬁgurations. In [4] an eye-to-hand camera is in charge of the robot tool positioning while an eye-in-hand camera is in charge of the robot tool orientation. A similar approach is used in [5], where an eye-to-hand camera is employed to estimate the robot tool pose with respect to the workspace and an eye-in-hand camera is employed as data source for object pose estimation. Further, in [6], a camera mounted on the end effector of a robot has been adopted as an eye-to-hand camera for another robot to beneﬁt of the advantages of a mobile camera. All the above approaches do not fully exploit the potential- ities of hybrid camera conﬁgurations. In fact, the information provided by different types of cameras (ﬁxed or mobile) is employed for different goals. Hence, a complete integration is not really achieved. Moreover, the possibility to adopt a The authors are with PRISMA Lab, Dipartimento di Informatica e Sistemistica Universit` a degli Studi di Napoli Federico II Via Claudio 21, 80125 Napoli, Italy {lippiell,siciliano,lvillani}@unina.it multi-camera visual system for both camera conﬁgurations is not considered. In this work, a new approach based on the Extended Kalman Filter (EKF) is proposed to achieve a complete data fusion in a multi-camera eye-in-hand/eye-to-hand visual system. This approach allows the data provided by all the cameras to be used at the same time, without any kind of “a priori” discrimination. A suitable image-feature selection algorithm is in charge of dynamically selecting the data required for the execution of a speciﬁc task depending on the current conﬁguration of the workspace. Only the selected features are grabbed and elaborated to achieve the measurements, and thus the computational time spent for image processing is independent of the number of cameras. The Kalman ﬁlter computes the estimate of the pose of an object in motion in the visible workspace, which is fed back to a position-based visual servoing algorithm. Since the frequency of the pose estimation algorithm is limited by the camera frame rate (25–60 Hz), while a higher control bandwidth (more than 100 Hz) is required to guarantee stability and disturbance rejection for position control of a robot manipulator, an “indirect” visual servoing algorithm is implemented [3]. This scheme is based on an inner/outer feedback loop where the inner position feedback loop runs at a frequency higher than the outer visual feedback loop. This paper is organized as follows. In Section II the model of the visual system and of the workspace is presented. The formulation of the EKF is illustrated in Section III. In Section IV the pose estimation algorithm is described, and the position-based visual servoing control scheme is brieﬂy outlined. Experimental results for the case of two robots performing a vision-guided master/slave trajectory following task are presented in Section V. II. MODELING Consider a system of n f video cameras ﬁxed in the workspace (eye-to-hand cameras) and n m video cameras mounted on the end effector of one or more robots (eye-in- hand cameras), with n = n f + n m . The geometry of the sys- tem with respect to a generic camera can be described using the classical pinhole model (see Fig. 1). In the following, the symbols F and M will denote the set of eye-to-hand cameras and eye-in-hand cameras respectively; moreover, the index ci will be used to denote the quantities referred to the camera frame ci. For each camera, a frame O ci –x ci y ci z ci attached to the camera ci is considered, with the z ci -axis aligned to the optical axis and the origin in the optical center. The sensor plane is parallel to the x ci y ci -plane at a distance -λ ci e along the z ci -axis, where λ ci e is the effective focal length of the Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 2005 Seville, Spain, December 12-15, 2005 WeB15.1 0-7803-9568-9/05/$20.00 ©2005 IEEE 5354