High-Speed Pose and Velocity Measurement from Vision
Redwan Dahmouche, Omar Ait-Aider, Nicolas Andreff and Youcef Mezouar
LASMEA - CNRS - Universit´ e Blaise Pascal
63175 Aubi` ere, France
{firstname.lastname}@lasmea.univ-bpclermont.fr
Abstract— This paper presents a novel method for high speed
pose and velocity computation from visual sensor. The main
problem in high speed vision is the bottleneck phenomenon
which limits the video rate transmission. The proposed ap-
proach circles the problem out by increasing the information
density instead of the data rate transmission. This strategy is
based on a rotary sequential acquisition of selected regions of
interest (ROI) which provides space-time data. This acquisition
mode induces an image projection deformation of dynamic
objects. This paper shows how to use this artifact for the
simultaneous measure of both pose and velocity, at the same
frequency as the ROI’s acquisition one.
I. INTRODUCTION
Vision is used at several levels in robotics, particularly in
localization, identification [1] and control [2], [3]. However
the slow rate of video sensors is an evident drawback
in high sampling frequency applications. Indeed, standard
high-speed cameras video rate is about 120Hz while high
speed dynamic control application runs typically at 1kHz.
Nevertheless, it has been reported that high-speed vision
could be used in dynamic control of serial robots [4], where
a General Predictive Control (GPC) scheme was associated
to a visual loop linearisation to adapt the video rate (120
Hz) to the control sampling frequency (500 Hz). However,
this solution increases control complexity. An alternative
solution is to increase the video rate to reach the system
sampling frequency. To do so, different approaches have been
presented in the literature.
Usually, camera video rate is limited by the transmission
interface bandwidth. Reducing the image resolution to de-
crease the video flow tightens a lot the field of view of the
camera for a given accuracy of the end-effector pose estima-
tion. To solve this problem, different approaches are possible
such as video rate increasing by developing a more efficient
video compression [5], creating faster transmission interfaces
(for instance, CamLink) or embedding the signal processing
close to the acquisition system [6], [7]. Nevertheless, we
believe that the optimal solution is to increase the video flow
information density. Indeed, the current approach in vision
based applications is to grab and transmit the whole image,
to extract interesting features to process and to throw away
the rest of the image. For instance, to provide vision based
pose estimation of a moving object from a single image,
This work was supported by R´ egion d’Auvergne through the
Innov@pˆ ole project and by the European Union through the In-
tegrated Project NEXT no. 0011815.
four non degenerate point projections are enough [8]. The
ratio between the amount of data needed to perform the pose
estimation and the transmitted flow of acquired image of size
S is given by
4×2×precision size
S×unsigned char size
. For a mega-pixel image
size the ratio is 6.4 10
-5
. The transmitted data is bigger than
1.5 10
4
times the needed amount.
Instead of transmitting the whole image and then selecting
a regions of interest (ROI), it is more interesting, from
the data flow and the ‘silicium cost’ points of view, to
inverse the process by first selecting the ROI position, and
then to transmit it. Note that in the two cases, the ROI
positions are predicted, so there is no difference between
the two approaches if the rest of the image is not used. This
acquisition mode was proposed in [9] where a new CMOS
camera was designed to grab a simultaneously multiple ROIs.
The same approach can be performed using an ”off-
the-shelf” CMOS fast reconfigurable camera which uses
the CamLink interface. A single rectangular area can be
selected for shuttering and transmitting. Its parameters can
be changed dynamically at each acquisition. By grabbing
only areas in the scene that contain information, such as
interest points or blobs (Figure 1), the information density
in the video flow is increased. The direct effect of this is
that the ROI acquisition frequency can be multiplied by
the ratio of the full image size on the size of the grabbed
area. For instance, transmitting ten regions of interest of
10 × 10 pixels that contain the desired information, instead
of the 1024 × 1024 pixels image size, reduces the data flow
from 1M pixels to 1K pixels, and theoretically multiplies
the acquisition frequency by 1000. In practice, transmission
control bits, parameters setting and exposure time limits the
video rate. Note that the exposure time and the acquisition
frequency can also be controlled.
Unfortunately, sequential acquisition of partial areas on the
retina introduces time delay between acquisitions and affects
image projection of moving objects. Thus, classical pose
estimation algorithm can not be used in this case. In addition,
these methods enable only to estimate successive poses. The
velocity information is generally retrieved by numerically
differentiating the pose measurements. This introduces addi-
tional noise.
To compute pose and velocity at each sample, one ap-
proach consists in using data fusion methods from a set
of partial time varying information (eg. Kalman filter [6]).
However this approach assumes a Gaussian noise which
is not guaranteed in pose measurement applications. To
2008 IEEE International Conference on
Robotics and Automation
Pasadena, CA, USA, May 19-23, 2008
978-1-4244-1647-9/08/$25.00 ©2008 IEEE. 107