3D Monocular Robotic Ball Catching with an Iterative Trajectory Estimation Refinement Vincenzo Lippiello and Fabio Ruggiero Abstract— In this paper, a 3D robotic ball catching algorithm which employs only an eye-in-hand monocular visual-system is presented. A partitioned visual servoing control is used in order to generate the robot motion, keeping always the ball in the field of view of the camera. When the ball is detected, the camera mounted on the robot end-effector is commanded to follow a suitable baseline in order to acquire measurements and provide a first possible interception point through a linear estimation process. Thereafter, further visual measures are acquired in order to continuously refine the previous prediction through a non-linear estimation process. Experimental results show the effectiveness of the proposed solution. I. I NTRODUCTION Smart sensing, object tracking, motion prediction, on-line trajectory planning and motion coordination are capabilities required in a robotic system to catch a thrown ball. One of the first approaches where robot manipulators are used in catching moving objects can be found in [1], while in [2] a stereo vision system with a large baseline, an extended Kalman filter (EKF) and a ball trajectory predictor are exploited so as to build a robotic ball catcher. The same system equipped with a dexterous multi-fingered hand is tested in [3], while recently a new version of this work is proposed in [4] involving a mobile humanoid and a circular gradient method to detect the ball in the images. In [5], [6] a high-speed multi-fingered hand and a high- speed stereo vision system are employed to catch a falling ball and a falling cylinder. A robotic arm whose aim is to catch a ball before it fell from a table is considered in [7], where uncalibrated cameras are employed to track the moving ball. In [8] the control is applied to achieve si- multaneously all the 2D tasks defined for all the images: if all the respective goals defined in the images are accomplished at the same time, then the 3D task can be interpreted as successful. A DSP is employed as a computational platform in the catcher robot system developed in [9], while in [10] an iterative prediction algorithm determines the final time and position of a humanoid motion, whose primitives derive from studies on human movements. Other examples of robotic ball catching can be found in [11], where the ball path is predicted with the help of a proper neural network. The ball catching task is also considered as a case study in several virtual-reality appli- cations [12], [13], [14]. The authors are with PRISMA Lab, Dipartimento di Informatica e Sistemistica, Universit` a degli Studi di Napoli Federico II, via Claudio 21, 80125, Naples, Italy {lippiello, fabio.ruggiero}@unina.it Several papers make use of Chapman’s strategy – the fielder should run at a proper speed to maintain a constant increasing rate of the tangent to the ball’s elevation angle [15] – to catch a ball. In [16] reinforcement learning models are used, while an autonomous mobile robot is considered in [17] for a ball catching task using a visual feedback control method based on a Linear Optimal Trajectory strategy. In [18] it is introduced an alternative strategy still based on Chapman’s hypothesis, called Gaining Angle of Gaze, which requires only the information about the elevation angle of gaze captured as a 2D information. Finally, in [19] a motion- analysing technique over a finite time is used inside a closed- loop system in order to catch a thrown ball. Most of the presented approaches use either a stereo visual system to solve the 3D catching problem or a single camera for the 2D case. This scenario is reasonable because 3D tracking of the ball takes benefits from triangulation methods while, in the case of a single camera, only 2D information is directly available. However, a high frame rate and optics with a good accuracy are required to achieve an accurate and fast trajectory prediction, i.e. a successful catch. By using only one camera, the cost of the equipment can be reduced. Moreover, the calibration procedure for one camera is easier than in the stereo case. In [20] the estimation of the 3D state of a thrown object by using a least-squares solution starting from a sequence of images given by a single camera is presented. Further, in [21] a combination of image-based and position-based visual servoing with an eye-to-hand camera configuration is employed in order to catch a ball whose trajectory is estimated through a RLS algorithm. In this paper, a monocular robotic 3D ball catching is proposed. A robot manipulator with a standard CCD camera mounted in an eye-in-hand configuration is driven by visual information in order to track a thrown ball. When the ball is detected for the first time, the camera is commanded to follow a suitable baseline in the 3D space in order to increase the estimation robustness. The ball is always kept in the camera field of view through a partitioned visual servoing control. During this starting motion, 2D information is collected and elaborated in order to get a first prediction of the ball trajectory through a rough linear estimation: such prediction is employed as a starting point for a more precise trajectory refinement through a nonlinear estimator. Hence, the visual measurements are continuously elaborated in order to update the estimation of the ball trajectory on-line, and thus the prediction of the interception pose. Finally, whenever the continuous refinement does not improve significantly the prediction of the trajectory, the final catching pose can be 2012 IEEE International Conference on Robotics and Automation RiverCentre, Saint Paul, Minnesota, USA May 14-18, 2012 978-1-4673-1405-3/12/$31.00 ©2012 IEEE 3950