Bayesian 3D Independent Motion Segmentation
with IMU-aided RBG-D Sensor
Jorge Lobo
1
, João Filipe Ferreira
1
, Pedro Trindade
2
and Jorge Dias
1,3
Abstract— In this paper we propose a two-tiered hierarchical
Bayesian model to estimate the location of objects moving
independently from the observer. Biological vision systems are
very successful in motion segmentation, since they efficiently
resort to flow analysis and accumulated prior knowledge of the
3D structure of the scene. Artificial perception systems may also
build 3D structure maps and use optical flow to provide cues
for ego- and independent motion segmentation. Using inertial
and magnetic sensors and an image and depth sensor (RGB-
D) we propose a method to obtain registered 3D maps, which
are subsequently used in a probabilistic model (the bottom
tier of the hierarchy) that performs background subtraction
across several frames to provide a prior on moving objects.
The egomotion of the RGB-D sensor is estimated starting with
the angular pose obtained from the filtered accelerometers
and magnetic data. The translation is derived from matched
points across the images and corresponding 3D points in the
rotation-compensated depth maps. A gyro-aided Lucas Kanade
tracker is used to obtain matched points across the images. The
tracked points can also used to refine the initial sensor based
rotation estimation. Having determined the camera egomotion,
the estimated optical flow assuming a static scene can be
compared with the observed optical flow via a probabilistic
model (the top tier of the hierarchy), using the results of
the background subtraction process as a prior, in order to
identify volumes with independent motion in the corresponding
3D point cloud. To deal with the computational load CUDA-
based solutions on GPUs were used. Experimental results are
presented showing the validity of the proposed approach.
I. INTRODUCTION
Motion cues play an essential part in perception – they are
ubiquitous in the process of making sense of the surrounding
world, both for humans and for robots. However, motion
perception has been long considered a difficult problem to
tackle in artificial perception; although there has been a
substantial amount of work in attempting to devise a solution
by solely using vision, the challenges faced by the need to
distinguish between optical flow caused by self-motion of the
observer (i.e. egomotion) and by objects or agents moving
independently from the observer are not at all trivial.
In biological vision systems both static and dynamic
inertial cues provided by the vestibular system also play an
*The research leading to these results has been partially supported by the
HANDLE project, which has received funding from the European Commu-
nity’s 7th Framework Programme under grant agreement ICT 231640.
1
Jorge Lobo, João Filipe Ferreira and Jorge Dias are with the Institute
of Systems and Robotics (ISR) and the Department of Electrical and Com-
puter Engineering, University of Coimbra, Portugal {jlobo, jfilipe,
jorge}@isr.uc.pt
2
Pedro Trindade is with the ISR - Institute of Systems and Robotics,
University of Coimbra, Portugal pedrotrindade@isr.uc.pt
3
Jorge Dias is also with the Robotics Institute, Khalifa University, Abu
Dhabi, UAE.
important role in perception. In particular, they are deeply
involved in the process of motion sensing, and are fused with
vision in the early processing stages of image processing (e.g,
the gravity vertical cue). As a result, artificial perception sys-
tems for robotic applications have since recently been taking
advantage from low-cost inertial sensors for complementing
vision systems [1].
On the other hand, an interesting hypothesis has been
raised by studies in neuroscience such as presented in [2],
which states that there are fast routes in the brain that
are used to rapidly paint the rough overall 3D view of an
observed scene, which is then fed back to lower levels of
2D perceptual processing as a prior. In fact, it is also posited
by several authors that an accumulated prior knowledge
of the 3D structure of the scene is retroinjected into the
primary brain sites for flow analysis, thus modulating motion
segmentation processing.
Besides the work described in [1] and references therein,
recent work has been done in reexamining the Lucas-Kanade
method for real-time independent motion detection [3].
In our approach we combine, in a probabilistic way, an
inter-frame estimate of independent motion, based on the
difference between observed optical flow and the estimated
optical flow given the scene depth map and observer ego-
motion, with a background subtraction method based on the
repeated observation of the same scene, to have a more robust
independent motion segmentation.
The next section presents our approach for estimating
the observer egomotion and registering the observed 3D
point clouds to a common frame of reference. In section 3
the two-tiered Bayesian hierarchical model for independent
motion segmentation is presented, combining background
subtraction with optical flow consistency. This is followed
by some experimental results and concluding remarks.
II. ESTIMATING EGOMOTION AND
REGISTERING 3D POINT CLOUDS OF THE
IMU-AIDED RGB-D SENSOR
A. Estimating and Compensating for Egomotion
A moving RGB-D observer of a background static scene
with some moving objects computes at each instant a dense
depth map (or point cloud) corresponding to the captured
image. The point clouds will change in time due to both the
moving objects and the observer ego-motion. A first step to
process the incoming data is to register the point clouds to a
common fixed frame of reference {W}, as shown on Figure
1.
2012 IEEE International Conference on
Multisensor Fusion and Integration for Intelligent Systems (MFI)
September 13-15, 2012. Hamburg, Germany
978-1-4673-2511-0/12/$31.00 ©2012 IEEE 445