Robust Foreground Segmentation from Color Video Sequences Using Background Subtraction with Multiple Thresholds Hansung Kim † , Ryuuki Sakamoto † , Itaru Kitahara †‡ , Tomoji Toriyama † , and Kiyoshi Kogure † †Knowledge Science Lab, ATR, Keihanna Science City, Kyoto, 619-0288, Japan ‡Dept. of Intelligent Interaction Technologies, Univ. of Tsukuba Tsukuba Science City, Ibaraki, 305-8573, Japan E-mail: †{hskim, skmt, toriyama, kogure}@atr.jp, ‡kitahara@iit.tsukuba.ac.jp Abstract A new robust method to segment foreground regions from color video sequences using multiple thresholds and morphological processes is proposed. Background models are observed for a long time, and their mean and standard deviation are used for background subtraction. Shadow regions are eliminated using color components, and the final foreground silhouette is extracted by smoothing the boundaries of the foreground and eliminating errors in and outside the regions. Experimental results show that the proposed algorithm works very well in various background and foreground situations. Key words Foreground segmentation, Background subtraction, Color model, Shadow elimination 1. Introduction Object segmentation from a video sequence, one important problem in the image processing field, includes such applications as video surveillance, teleconferencing, video editing, human-computer interface, etc. Conventional object segmentation algorithms are roughly classified into two categories based on their primary segmentation criteria. The first approaches use spatial homogeneity as a criterion. Morphological filters are used to simplify the image, and then a watershed algorithm is applied to the region boundary decision [1][2]. The segmentation results of these algorithms tend to track the object boundary more precisely than other methods because they use a watershed algorithm. However, the main drawback of these algorithms is high computational complexity. The second approaches exploit change detection in video sequences. Some algorithms have used frame difference [3], but the most common approach is background subtraction [4-6], which subtracts the current image from a static background image acquired in advance from multiple images over a period of time. Since this technique works very fast and distinguishes semantic object regions from static backgrounds, it has been used for years in many vision systems. As an example of such systems, we have already developed an immersive free-viewpoint video system using multiple cameras [7]. The system reconstructs 3D models from captured video streams using a shape-from-silhouette method and generates realistic free-view video of those objects from a virtual camera. The shape-from-silhouette method is a very common way of converting silhouette contours into 3D objects [8-10], but the segmentation process creates a bottleneck. Although background subtraction methods provide fairly good results, they still have limitations for use in practical applications. Most conventional approaches use brightness or color information of the images that cause errors based on lighting conditions and the color of foreground objects. In addition, object shadow in the background region and highlighting from lighting conditions can cause serious trouble. In this paper, a robust foreground segmentation algorithm is proposed. It is assumed that the background is static and the lighting condition does not change drastically. We classify the segmentation mask into four categories based on their reliability and refine them with color information and morphological processes. In the next two sections, we describe the overall flow and detailed algorithms of the proposed method. Experimental results are shown in Section 4, and we conclude in Section 5. 2. Background model The background is modeled in two distinct parts: a luminance model and a color model. Input video stream has three channels with RGB components, but they are very sensitive to noise and changes of lighting conditions. Therefore, we use a luminance component of the color images for initial object segmentation. Image luminance is calculated with the following equation [11]: B G R Y × + × + × = 114 . 0 587 . 0 299 . 0 . (1)