Visual Servoing for Dual Arm Motions on a Humanoid Robot Nikolaus Vahrenkamp Christian B¨ oge ∗‡ Kai Welke Tamim Asfour urgen Walter udiger Dillmann Institute for Anthropomatics Department of Mechanical Engineering and Mechatronics University of Karlsruhe Karlsruhe University of Applied Sciences Haid-und-Neu Str. 7 Moltkestr. 30 76131 Karlsruhe, Germany 76133 Karlsruhe, Germany {vahrenkamp,welke,asfour,dillmann}@ira.uka.de {christian.boege,juergen.walter}@hs-karlsruhe.de Abstract— In this work we present a visual servoing approach that enables a humanoid robot to robustly execute dual arm grasping and manipulation tasks. Therefore the target object(s) and both hands are tracked alternately and a combined open- / closed-loop controller is used for positioning the hands with respect to the target(s). We address the perception system and how the observable workspace can be increased by using an active vision system on a humanoid head. Furthermore a control framework for reactive positioning of both hands using position based visual servoing is presented, where the sensor data streams coming from the vision system, the joint encoders and the force/torque sensors are fused and joint velocity values are generated. This framework can be used for bimanual grasping as well as for two handed manipulations which is demonstrated with the humanoid robot Armar-III that executes grasping and manipulation tasks in a kitchen environment. I. INTRODUCTION Humanoid robots are developed to work in human- centered environments and to assist people in doing the housework, e.g. cleaning the dishes or serving a meal. To enable the robot operating in a safe and robust manner, a lot of components have to collude and to operate cooperatively. In this paper we show how dual arm tasks, like bimanual grasping or dual arm manipulations, can be executed with high accuracy. The fusion of multiple modalities combined with position based visual servoing allows an exact posi- tioning of both arms in workspace and thus enables the robot to execute dexterous dual arm tasks. The proposed algorithms are implemented and evaluated on the humanoid robot Armar-III [1]. The perceptional components needed for the dual arm vi- sual servoing approach are discussed in section II. In section III the general approach for visually controlled movements is described and the extensions for dual arm manipulation tasks are discussed in section IV. Finally, in section V the application of the proposed algorithms is attested by two experiments on the humanoid robot Armar-III. II. PERCEPTION Target positions of objects and the positions of both hands have to be made available to the visual servoing approach. Therefore, the images from the stereo camera pair are processed with appropriate recognition and localization algorithms. Both, the pose of the target objects and the position of the end effectors are determined in Cartesian Fig. 1. Armar-III executing bimanual manipulations in the kitchen environment. space. For fast and robust object recognition and localization, the approach proposed in [2] is deployed. The marker-based end effector localization is performed similar to the approach presented in [3]. In the following, we will describe extensions made to this prior work for the scenario at hand. A. Object recognition and localization for Dual Arm Grasp- ing Tasks The vision framework of Armar-III offers methods for recognition and localization of every day kitchen objects like cups or cereal boxes [2]. The algorithms can handle uniformly colored and textured objects as long as they are fully visible in the stereo camera images. If the robot is supposed to handle large objects, like the wok that is used in the experiments (see Fig. 2(a)), the object is not always visible as a whole in both camera images during the task. Therefore, the wok is decomposed into two smaller objects (the handles) which are easy to track, in order to avoid loosing visual target information. The two handles are good features, since they mark the target of grasping actions and can be tracked independently. B. Tracking Multiple Objects In the case of servoing two hands using visual feedback, tracking by turns of both hands is necessary since the area covered by vision is limited by the camera’s field of view. In order to enlarge this visually observable area, we move the