Online Moving Camera Background Subtraction Ali Elqursh and Ahmed Elgammal Rutgers University Abstract. Recently several methods for background subtraction from moving camera were proposed. They use bottom up cues to segment video frames into foreground and background regions. Due to this lack of explicit models, they can easily fail to detect a foreground object when such cues are ambiguous in certain parts of the video. This becomes even more challenging when videos need to be processed online. We present a method which enables learning of pixel based models for foreground and background regions and, in addition, segments each frame in an online framework. The method uses long term trajectories along with a Bayesian ﬁltering framework to estimate motion and appearance models. We compare our method to previous approaches and show results on challenging video sequences. 1 Introduction One may argue that the ultimate goal of computer vision is to learn and per- ceive the environment in the same way children learn. Without access to pre- segmented visual input, infants learn how to segment objects from background using low level cues. Inspired by this evidence, signiﬁcant eﬀort in the computer vision community has focused on bottom up segmentation of images and videos. It has become ever more important with the proliferation of videos captured by moving cameras. Our goal is to develop an algorithm for foreground/background segmentation from freely moving camera in a online framework that is able to deal with arbi- trary long sequences. Traditional video segmentation comes in diﬀerent ﬂavors depending on the application but falls short of achieving this goal. In background subtraction, moving foreground objects are segmented by learning a model of the background with the assumption of a static background. Alternatively, motion segmentation methods attempt to segment sparse point trajectories based on co- herency of motion. However, they lack a model of the appearance of foreground or background. Video object segmentation attempts to segment an object of interest from the video with no model of the scene background. On the other hand there are several segmentation techniques that attempts to extend tradi- tional image segmentation to the temporal domain. Such techniques are typically limited to segmenting a short window of time. It is frequently the case however, that if one only considers a short window of frames, that low level cues may be ambiguous. Existing approaches either ignore this problem or resort to processing the whole video oﬄine. Oﬄine methods can A. Fitzgibbon et al. (Eds.): ECCV 2012, Part VI, LNCS 7577, pp. 228–241, 2012. c  Springer-Verlag Berlin Heidelberg 2012