Independent Multimodal Background Subtraction Domenico Bloisi and Luca Iocchi Department of Computer, Control, and Management Engineering - Sapienza University of Rome, Italy Background subtraction is a common method for detecting moving objects from static cameras able to achieve real-time performance. However, it is highly dependent on a good background model particularly to deal with dynamic scenes. In this paper a novel real-time algorithm for creating a robust and multimodal background model is presented. The proposed approach is based on an on-line clustering algorithm to create the model and on a novel conditional update mechanism that allows for obtaining an accurate foreground mask. A quantita- tive comparison of the algorithm with several state-of-the-art methods on a well-known benchmark dataset is provided demonstrating the effectiveness of the approach. 1 INTRODUCTION Background subtraction (BS) is a popular method for detecting moving objects from static cameras able to achieve real-time performance. BS aims to identify moving regions in image sequences comparing the current frame to a model of the scene background (BG). The creation of such a model is a challeng- ing task due to illumination changes (gradual and sudden), shadows, camera jitter, movement of back- ground elements (e.g., trees swaying in the breeze, waves in water), and changes in the background ge- ometry (e.g., parked cars). Different classifications of BS methods have been proposed in literature. In (Cristani and Murino 2008), BS algorithms are organized in: 1) per pixel, 2) per re- gion, 3) per frame and 4) hybrid. Per-pixel approaches (e.g., (Cucchiara et al. 2003; Stauffer and Grimson 1999)) consider each pixel signal as an indepen- dent process. Per-region algorithms (e.g., (Heikkila and Pietikainen 2006)) usually divide the frames into blocks and calculate block-specific features in or- der to obtain the foreground. Frame-level methods look for global changes in the scene (e.g., (Oliver et al. 2000)). Hybrid methods (e.g., (Wang and Suter 2006; Toyama et al. 1999)) combine the previous ap- proaches in a multi-stage process. In (Cheung and Kamath 2004) two classes of BS methods, namely recursive and non-recursive, are identified. Recursive algorithms (e.g., (Stauffer and Grimson 1999)) maintain a single background model that is updated with each new input frame. Non- recursive approaches (e.g., (Cucchiara et al. 2003; Oliver et al. 2000)) maintain a buffer of previous video frames and estimate the background model based on a statistical analysis of these frames. A third classification (e.g., (Mittal and Paragios 2004)) divides existing BS methods in predictive and non-predictive. Predictive algorithms (e.g., (Doretto et al. 2003)) model the scene as a time series and de- velop a dynamical model to recover the current in- put based on past observations. Non-predictive tech- niques (e.g., (Stauffer and Grimson 1999; Elgammal et al. 2000)) neglect the order of the input observa- tions and build a probabilistic representation of the observations at a particular pixel. Although all the above mentioned approaches can deal with dynamic background, a real-time, complete, and effective solution does not yet exist. In partic- ular, water background is more difficult than other kinds of dynamic background since waves in water do not belong to the foreground even though they involve motion. Per-pixel approaches (e.g., (Stauffer and Grimson 1999)) typically fail because these dy- namic textures cause large changes at an individual pixel level (see Fig. 1) (Dalley et al. 2008). A non- parametric approach (e.g., (Elgammal et al. 2000)) is not able to learn all the changes, since in the water surface the changes do not present any regular pat- terns (Tavakkoli and Bebis 2006). More complex ap- proaches (e.g., (Sheikh and Shah 2005; Zhong and Sclaroff 2003; Zhong et al. 2008)), can obtain better results at the cost of increasing the computational load of the process. In this paper, a per-pixel, non-recursive, non- predictive BS approach is described. It has been de- signed especially for dealing with water background, 1