Dynamic background modeling for moving objects detection using a mobile stereo camera Alessandro Moro, Enzo Mumolo University of Trieste Trieste, ITALY Massimiliano Nolich IFACE s.r.l. Trieste, ITALY Kenji Terabayashi, Kazunori Umeda Chuo University, CREST JST Tokio, JAPAN Abstract—Background updating is fundamental in mobile objects detection applications. This paper proposes a background updating method with a moving stereo camera. The proposed algorithm is based on the detection of the regions in the image that have major color intensity in the scene (called light zones). From these light zones some keypoints are extracted and matched between the previous background and the current foreground images. Image registration is performed by moving the old background image according to the keypoints matching so that the foreground and background images are mostly aligned. The proposed method requires that the camera moves slowly and it is used for moving objects detection with background subtraction. Three types of keypoints are tested using the same homography: light zone, SIFT and SURF keypoints. We show experimentally that, on the average, light zone keypoints performances are equal to or better than SIFT keypoints, and are faster to compute; moreover, the SURF keypoints perform worse. To get better performances, when the light zone keypoints fail, then the SIFT keypoints are used in a data fusion framework. I. I NTRODUCTION Moving object detection from a moving camera is funda- mental in many mechatronic tasks, including autonomous and industrial robotics and transportation systems. Most of the moving objects detection schemes refer to ﬁxed cameras. The main difference between motion detection from a ﬁxed and a moving camera, is the creation of the background model. In this paper we deal with the problem of achieving a stable background while the camera moves. In this work, we use the following camera movements: rotations on the vertical camera axis and translations of the optical axis. The basic idea is to acquire an initial background image, and to align it to the subsequent frames. If the camera movement is slow, there are several features that can be matched between the two images. The alignment is obtained by moving each pixel of the background image according to a registration matrix computed on the basis of the correspondence between the anchor-points (in the following called keypoints) detected in the background and foreground images. The main contribution of this paper is the use of the light zones detected in the images to extract keypoints. Light zones are intrinsic features of every image and include reﬂecting surfaces and light emitting devices. When the light zone key- points fail, then the SIFT keypoints are used in a data fusion framework. The proposed method have better performances than using SIFT or SURF keypoints in terms of computational complexity, number of correctly matched frames and quality of alignment. It is worth noting that the light zones do not describe objects characteristics, but only the light and reﬂectance properties of the environment; hence, their ﬁeld of application is more limited than SIFT or SURF. We will show that SIFT and light zones keypoints have similar performances in the background updating task; however the SIFT keypoints are about one order of magnitude more computationally complex than the light keypoints and SURF keypoints are simpler to compute but perform worse. The background updating algorithm described in this paper has been used in a background subtraction moving objects detection framework. This paper is structured as follows. In Section II previous work on background modeling and updating is summarized. In Section III an overview of the proposed background modeling is presented. A detailed presentation of keypoints based on light zones is reported in Section IV, and the registration and updating technique is presented in Section V. In Sec- tion VI SIFT and SURF features are brieﬂy recalled, and their performances are compared to light zone keypoints. Finally, Section VII reports some concluding remarks. II. RELATED WORK There are many papers dealing with background model- ing, mostly related to ﬁxed cameras and for mobile object detection. To this extent, early approaches assumed a stable background that was coupled with a simple, and known, noise process or assumed a pixel-wise statistical model that conformed to a Gaussian [1]. Although stable in controlled indoor conditions, these techniques are sensitive to global illumination changes or when local pixel variation is not modeled in the noise term. More sophisticated approaches using a multi-modal Gaussian Mixture Model (GMM), for example [2], were introduced to deal with more scene changes than previously possible. Non-parametric estimation of a probability density function (pdf) for both background and foreground was introduced by Elgammal et. al. [3] to partly alleviate this. Less papers that deal with moving cameras, as compared to ﬁxed cameras, have been published. Notably, [4] deal with pan/tilt camera movements. The authors de- scribe approaches for coping with inaccuracies due to motion The 8th France-Japan and 6th Europe-Asia Congress on Mechatronics November 22-24, 2010, Yokohama, Japan 143