3D Monocular Robotic Ball Catching with an Iterative Trajectory Estimation Reﬁnement Vincenzo Lippiello and Fabio Ruggiero Abstract— In this paper, a 3D robotic ball catching algorithm which employs only an eye-in-hand monocular visual-system is presented. A partitioned visual servoing control is used in order to generate the robot motion, keeping always the ball in the ﬁeld of view of the camera. When the ball is detected, the camera mounted on the robot end-effector is commanded to follow a suitable baseline in order to acquire measurements and provide a ﬁrst possible interception point through a linear estimation process. Thereafter, further visual measures are acquired in order to continuously reﬁne the previous prediction through a non-linear estimation process. Experimental results show the effectiveness of the proposed solution. I. I NTRODUCTION Smart sensing, object tracking, motion prediction, on-line trajectory planning and motion coordination are capabilities required in a robotic system to catch a thrown ball. One of the ﬁrst approaches where robot manipulators are used in catching moving objects can be found in [1], while in [2] a stereo vision system with a large baseline, an extended Kalman ﬁlter (EKF) and a ball trajectory predictor are exploited so as to build a robotic ball catcher. The same system equipped with a dexterous multi-ﬁngered hand is tested in [3], while recently a new version of this work is proposed in [4] involving a mobile humanoid and a circular gradient method to detect the ball in the images. In [5], [6] a high-speed multi-ﬁngered hand and a high- speed stereo vision system are employed to catch a falling ball and a falling cylinder. A robotic arm whose aim is to catch a ball before it fell from a table is considered in [7], where uncalibrated cameras are employed to track the moving ball. In [8] the control is applied to achieve si- multaneously all the 2D tasks deﬁned for all the images: if all the respective goals deﬁned in the images are accomplished at the same time, then the 3D task can be interpreted as successful. A DSP is employed as a computational platform in the catcher robot system developed in [9], while in [10] an iterative prediction algorithm determines the ﬁnal time and position of a humanoid motion, whose primitives derive from studies on human movements. Other examples of robotic ball catching can be found in [11], where the ball path is predicted with the help of a proper neural network. The ball catching task is also considered as a case study in several virtual-reality appli- cations [12], [13], [14]. The authors are with PRISMA Lab, Dipartimento di Informatica e Sistemistica, Universit` a degli Studi di Napoli Federico II, via Claudio 21, 80125, Naples, Italy {lippiello, fabio.ruggiero}@unina.it Several papers make use of Chapman’s strategy – the ﬁelder should run at a proper speed to maintain a constant increasing rate of the tangent to the ball’s elevation angle [15] – to catch a ball. In [16] reinforcement learning models are used, while an autonomous mobile robot is considered in [17] for a ball catching task using a visual feedback control method based on a Linear Optimal Trajectory strategy. In [18] it is introduced an alternative strategy still based on Chapman’s hypothesis, called Gaining Angle of Gaze, which requires only the information about the elevation angle of gaze captured as a 2D information. Finally, in [19] a motion- analysing technique over a ﬁnite time is used inside a closed- loop system in order to catch a thrown ball. Most of the presented approaches use either a stereo visual system to solve the 3D catching problem or a single camera for the 2D case. This scenario is reasonable because 3D tracking of the ball takes beneﬁts from triangulation methods while, in the case of a single camera, only 2D information is directly available. However, a high frame rate and optics with a good accuracy are required to achieve an accurate and fast trajectory prediction, i.e. a successful catch. By using only one camera, the cost of the equipment can be reduced. Moreover, the calibration procedure for one camera is easier than in the stereo case. In [20] the estimation of the 3D state of a thrown object by using a least-squares solution starting from a sequence of images given by a single camera is presented. Further, in [21] a combination of image-based and position-based visual servoing with an eye-to-hand camera conﬁguration is employed in order to catch a ball whose trajectory is estimated through a RLS algorithm. In this paper, a monocular robotic 3D ball catching is proposed. A robot manipulator with a standard CCD camera mounted in an eye-in-hand conﬁguration is driven by visual information in order to track a thrown ball. When the ball is detected for the ﬁrst time, the camera is commanded to follow a suitable baseline in the 3D space in order to increase the estimation robustness. The ball is always kept in the camera ﬁeld of view through a partitioned visual servoing control. During this starting motion, 2D information is collected and elaborated in order to get a ﬁrst prediction of the ball trajectory through a rough linear estimation: such prediction is employed as a starting point for a more precise trajectory reﬁnement through a nonlinear estimator. Hence, the visual measurements are continuously elaborated in order to update the estimation of the ball trajectory on-line, and thus the prediction of the interception pose. Finally, whenever the continuous reﬁnement does not improve signiﬁcantly the prediction of the trajectory, the ﬁnal catching pose can be 2012 IEEE International Conference on Robotics and Automation RiverCentre, Saint Paul, Minnesota, USA May 14-18, 2012 978-1-4673-1405-3/12/$31.00 ©2012 IEEE 3950