Video object segmentation and its salient motion detection using adaptive background generation T.K. Kim, J.H. Im and J.K. Paik Video object segmentation often fails when the background and fore- ground contain a similar distribution of colours. Proposed is a novel image segmentation algorithm to detect salient motion under a complex environment by combining temporal difference and back- ground generation. Experimental results show that the proposed algor- ithm provides a twice higher matching ratio than the conventional Gaussian mixture-based approaches under various conditions. Introduction: Background generation [1] has been an early criterion for video object segmentation, while foreground modelling has recently been used in conjunction with background modelling for more accurate movement detection. In many video applications, a salient motion is to be differentiated from uninteresting motions. As in [1] and [2], the salient motion is defined as a motion from a typical surveillance target as opposed to other distracting motions such as the scintillation from a ceiling fan and the swaying of tree branches in the wind. To detect moving objects in a dynamic scene, adaptive background generation techniques have been developed [3]. Monnet et al. [4] proposed a prediction-based online method for the modelling of dynamic scenes. Their approach has been tested on a coastal line with ocean waves and a scene with swaying trees. However, they need hundreds of images with no motion to learn the background model, and the moving object cannot be detected if they move in the same direction as the ocean waves. In [5], a dense optical flow map is created to infer the foreground objects moving in the opposite direction, moving in a group, and staying stationary by predetermined rules. Recently, the mixture of the Gaussian method is becoming popular because it can deal with slow lighting changes, periodical motions from the cluttered background, slow moving objects, long-term scene changes, and camera noise. In spite of the above-mentioned advantages, it cannot adapt to the quick lighting changes and cannot successfully handle shadows. In this Letter, we present a real-time robust method that provides a realistic foreground segmentation to detect salient motions in complex environments by combining temporal difference and background gener- ation. The proposed method, shown Fig. 1, aims at performing real-time background generation and salient motion detection of moving objects. motion detection motion < E removal no B t I t I t I t+1 I t+1 I t DI t B t –I t U + + yes background generation and updating salient motion detection temporal difference labelling SMDt foreground segmentation background subtraction camera Σ Σ Fig. 1 Proposed video object segmentation algorithm using adaptive back- ground generation Background modelling and updating: As the first step, we estimate the optical flow between two images I t ðx; yÞ and I tþ1 ðx; yÞ by minimising the Euclidean distance defined as: Eðd x ; d y Þ¼ P ux þw x¼ux w P uy þw y¼uy w ðI t ðx; yÞ I tþ1 ðx þ d x ; y þ d y ÞÞ 2 ð1Þ for each pixel ðu x ; u y Þ in I tþ1 ðx; yÞ, where ðd x ; d y Þ represents the displa- cement of the pixel at ðu x ; u y Þ and is initially set to be zero as: ðd x ; d y Þ nþ1 ¼ ½½ðd x ; d y Þþ P ux þw x¼ux w P uy þw y¼uy w ðI t ðx; yÞ I tþ1 ðx þ d x ; y þ d y ÞÞ n P ux þw x¼ux w P uy þw y¼uy w rI ðrI Þ T 1 n ð2Þ where w represents the neighbouring displacement. The vector rI tþ1 ¼ ½ð@I tþ1 =@xÞð@I tþ1 =@yÞ represents the image gradi- ent [6]. If E is smaller than a pre-specified threshold, the background is updated at the corresponding w. In the experiment we have used 0.35 for the threshold value. For a w with high E value the background is generated by minimising E, while the median filter is used for the remaining w. To overcome the drawbacks of a median filter under dynamic conditions, it is necessary to keep updating the background expressed as B t ðx; yÞ¼ð1 aÞI t ðx; yÞþ aB t1 ðx; yÞ ð3Þ where B t ðx; yÞ represents the background at time t, I t ðx; yÞ the input image at time t, and a the mixing ratio in the range [0, 1]. To detect an object’s salient motion in the background, we use the initial back- ground from the previous frame B t1 ðx; yÞ. Fig. 2 shows the background generation results of various scenes: the four upper images show the captured input sequence images, and the four bottom images represent the background generation in each pair. a b c d e f g h Fig. 2 Results of background generation using proposed algorithm a d: Input sequence image at 425th, 500th, 551st, 651st frames e h: Results of background generation at 425th, 500th, 551st, 651st frames Video object segmentation using proposed algorithm: In this Letter, temporally adjacent images I t ðx; yÞ and I tþ1 ðx; yÞ are subtracted and a threshold is applied to the difference image for extracting the entire region of change. To detect the slow motion or static objects, a fixed weighted accumulation is used to compute the temporal difference image DI t (x, y) as: DI t ðx; yÞ¼ 1 if ðð1 lÞAI t ðx; yÞþ lðjI t ðx; yÞ I tþ1 ðx; yÞjÞÞ . T 0; otherwise 8 < : ð4Þ where l is the weighting parameter which describes the temporal range for accumulating difference images. AI t (x, y) is initialised to an empty image. In this Letter, we set T ¼ 20 and l ¼ 0.5 for all experiments. We assume that the foreground with salient motion shows consistency over a period of time in both temporal difference and background sub- traction. It means that the optical flow of the region with salient motion in the given time period [t 1 , t n21 ] should be in the same direc- tion. The salient motion is detected using the temporal difference with background subtraction, along with the change in illumination. On the other hand, simple background subtraction exhibits inaccurate results. The output of salient motion detection is obtained as: SMD t ðx; yÞ¼ DI t ðx; yÞ < ðjB t ðx; yÞ I t ðx; yÞjÞ ð5Þ where B t ðx; yÞ represents the generated background image by using the proposed algorithm. In this Letter, the difference between I t ðx; yÞ and B t ðx; yÞ is computed, and the difference image is then the threshold for obtaining the change in motion. Fig. 3 represents the temporal differences, subtracted background images, and the detected salient motion regions. ELECTRONICS LETTERS 21st May 2009 Vol. 45 No. 11