3D Monocular Robotic Ball Catching
with an Iterative Trajectory Estimation Refinement
Vincenzo Lippiello and Fabio Ruggiero
Abstract— In this paper, a 3D robotic ball catching algorithm
which employs only an eye-in-hand monocular visual-system is
presented. A partitioned visual servoing control is used in order
to generate the robot motion, keeping always the ball in the field
of view of the camera. When the ball is detected, the camera
mounted on the robot end-effector is commanded to follow a
suitable baseline in order to acquire measurements and provide
a first possible interception point through a linear estimation
process. Thereafter, further visual measures are acquired in
order to continuously refine the previous prediction through a
non-linear estimation process. Experimental results show the
effectiveness of the proposed solution.
I. I NTRODUCTION
Smart sensing, object tracking, motion prediction, on-line
trajectory planning and motion coordination are capabilities
required in a robotic system to catch a thrown ball.
One of the first approaches where robot manipulators
are used in catching moving objects can be found in [1],
while in [2] a stereo vision system with a large baseline, an
extended Kalman filter (EKF) and a ball trajectory predictor
are exploited so as to build a robotic ball catcher. The same
system equipped with a dexterous multi-fingered hand is
tested in [3], while recently a new version of this work is
proposed in [4] involving a mobile humanoid and a circular
gradient method to detect the ball in the images.
In [5], [6] a high-speed multi-fingered hand and a high-
speed stereo vision system are employed to catch a falling
ball and a falling cylinder. A robotic arm whose aim is
to catch a ball before it fell from a table is considered
in [7], where uncalibrated cameras are employed to track
the moving ball. In [8] the control is applied to achieve si-
multaneously all the 2D tasks defined for all the images: if all
the respective goals defined in the images are accomplished
at the same time, then the 3D task can be interpreted as
successful. A DSP is employed as a computational platform
in the catcher robot system developed in [9], while in [10]
an iterative prediction algorithm determines the final time
and position of a humanoid motion, whose primitives derive
from studies on human movements.
Other examples of robotic ball catching can be found
in [11], where the ball path is predicted with the help of
a proper neural network. The ball catching task is also
considered as a case study in several virtual-reality appli-
cations [12], [13], [14].
The authors are with PRISMA Lab, Dipartimento di Informatica e
Sistemistica, Universit` a degli Studi di Napoli Federico II, via Claudio 21,
80125, Naples, Italy {lippiello, fabio.ruggiero}@unina.it
Several papers make use of Chapman’s strategy – the
fielder should run at a proper speed to maintain a constant
increasing rate of the tangent to the ball’s elevation angle [15]
– to catch a ball. In [16] reinforcement learning models
are used, while an autonomous mobile robot is considered
in [17] for a ball catching task using a visual feedback control
method based on a Linear Optimal Trajectory strategy.
In [18] it is introduced an alternative strategy still based on
Chapman’s hypothesis, called Gaining Angle of Gaze, which
requires only the information about the elevation angle of
gaze captured as a 2D information. Finally, in [19] a motion-
analysing technique over a finite time is used inside a closed-
loop system in order to catch a thrown ball.
Most of the presented approaches use either a stereo visual
system to solve the 3D catching problem or a single camera
for the 2D case. This scenario is reasonable because 3D
tracking of the ball takes benefits from triangulation methods
while, in the case of a single camera, only 2D information
is directly available. However, a high frame rate and optics
with a good accuracy are required to achieve an accurate and
fast trajectory prediction, i.e. a successful catch. By using
only one camera, the cost of the equipment can be reduced.
Moreover, the calibration procedure for one camera is easier
than in the stereo case. In [20] the estimation of the 3D state
of a thrown object by using a least-squares solution starting
from a sequence of images given by a single camera is
presented. Further, in [21] a combination of image-based and
position-based visual servoing with an eye-to-hand camera
configuration is employed in order to catch a ball whose
trajectory is estimated through a RLS algorithm.
In this paper, a monocular robotic 3D ball catching is
proposed. A robot manipulator with a standard CCD camera
mounted in an eye-in-hand configuration is driven by visual
information in order to track a thrown ball. When the ball
is detected for the first time, the camera is commanded
to follow a suitable baseline in the 3D space in order to
increase the estimation robustness. The ball is always kept
in the camera field of view through a partitioned visual
servoing control. During this starting motion, 2D information
is collected and elaborated in order to get a first prediction
of the ball trajectory through a rough linear estimation: such
prediction is employed as a starting point for a more precise
trajectory refinement through a nonlinear estimator. Hence,
the visual measurements are continuously elaborated in order
to update the estimation of the ball trajectory on-line, and
thus the prediction of the interception pose. Finally, whenever
the continuous refinement does not improve significantly the
prediction of the trajectory, the final catching pose can be
2012 IEEE International Conference on Robotics and Automation
RiverCentre, Saint Paul, Minnesota, USA
May 14-18, 2012
978-1-4673-1405-3/12/$31.00 ©2012 IEEE 3950