Bayesian 3D Independent Motion Segmentation with IMU-aided RBG-D Sensor Jorge Lobo 1 , João Filipe Ferreira 1 , Pedro Trindade 2 and Jorge Dias 1,3 Abstract— In this paper we propose a two-tiered hierarchical Bayesian model to estimate the location of objects moving independently from the observer. Biological vision systems are very successful in motion segmentation, since they efﬁciently resort to ﬂow analysis and accumulated prior knowledge of the 3D structure of the scene. Artiﬁcial perception systems may also build 3D structure maps and use optical ﬂow to provide cues for ego- and independent motion segmentation. Using inertial and magnetic sensors and an image and depth sensor (RGB- D) we propose a method to obtain registered 3D maps, which are subsequently used in a probabilistic model (the bottom tier of the hierarchy) that performs background subtraction across several frames to provide a prior on moving objects. The egomotion of the RGB-D sensor is estimated starting with the angular pose obtained from the ﬁltered accelerometers and magnetic data. The translation is derived from matched points across the images and corresponding 3D points in the rotation-compensated depth maps. A gyro-aided Lucas Kanade tracker is used to obtain matched points across the images. The tracked points can also used to reﬁne the initial sensor based rotation estimation. Having determined the camera egomotion, the estimated optical ﬂow assuming a static scene can be compared with the observed optical ﬂow via a probabilistic model (the top tier of the hierarchy), using the results of the background subtraction process as a prior, in order to identify volumes with independent motion in the corresponding 3D point cloud. To deal with the computational load CUDA- based solutions on GPUs were used. Experimental results are presented showing the validity of the proposed approach. I. INTRODUCTION Motion cues play an essential part in perception – they are ubiquitous in the process of making sense of the surrounding world, both for humans and for robots. However, motion perception has been long considered a difﬁcult problem to tackle in artiﬁcial perception; although there has been a substantial amount of work in attempting to devise a solution by solely using vision, the challenges faced by the need to distinguish between optical ﬂow caused by self-motion of the observer (i.e. egomotion) and by objects or agents moving independently from the observer are not at all trivial. In biological vision systems both static and dynamic inertial cues provided by the vestibular system also play an *The research leading to these results has been partially supported by the HANDLE project, which has received funding from the European Commu- nity’s 7th Framework Programme under grant agreement ICT 231640. 1 Jorge Lobo, João Filipe Ferreira and Jorge Dias are with the Institute of Systems and Robotics (ISR) and the Department of Electrical and Com- puter Engineering, University of Coimbra, Portugal {jlobo, jfilipe, jorge}@isr.uc.pt 2 Pedro Trindade is with the ISR - Institute of Systems and Robotics, University of Coimbra, Portugal pedrotrindade@isr.uc.pt 3 Jorge Dias is also with the Robotics Institute, Khalifa University, Abu Dhabi, UAE. important role in perception. In particular, they are deeply involved in the process of motion sensing, and are fused with vision in the early processing stages of image processing (e.g, the gravity vertical cue). As a result, artiﬁcial perception sys- tems for robotic applications have since recently been taking advantage from low-cost inertial sensors for complementing vision systems [1]. On the other hand, an interesting hypothesis has been raised by studies in neuroscience such as presented in [2], which states that there are fast routes in the brain that are used to rapidly paint the rough overall 3D view of an observed scene, which is then fed back to lower levels of 2D perceptual processing as a prior. In fact, it is also posited by several authors that an accumulated prior knowledge of the 3D structure of the scene is retroinjected into the primary brain sites for ﬂow analysis, thus modulating motion segmentation processing. Besides the work described in [1] and references therein, recent work has been done in reexamining the Lucas-Kanade method for real-time independent motion detection [3]. In our approach we combine, in a probabilistic way, an inter-frame estimate of independent motion, based on the difference between observed optical ﬂow and the estimated optical ﬂow given the scene depth map and observer ego- motion, with a background subtraction method based on the repeated observation of the same scene, to have a more robust independent motion segmentation. The next section presents our approach for estimating the observer egomotion and registering the observed 3D point clouds to a common frame of reference. In section 3 the two-tiered Bayesian hierarchical model for independent motion segmentation is presented, combining background subtraction with optical ﬂow consistency. This is followed by some experimental results and concluding remarks. II. ESTIMATING EGOMOTION AND REGISTERING 3D POINT CLOUDS OF THE IMU-AIDED RGB-D SENSOR A. Estimating and Compensating for Egomotion A moving RGB-D observer of a background static scene with some moving objects computes at each instant a dense depth map (or point cloud) corresponding to the captured image. The point clouds will change in time due to both the moving objects and the observer ego-motion. A ﬁrst step to process the incoming data is to register the point clouds to a common ﬁxed frame of reference {W}, as shown on Figure 1. 2012 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI) September 13-15, 2012. Hamburg, Germany 978-1-4673-2511-0/12/$31.00 ©2012 IEEE 445