-245- Temporal Edges and Spatial Classification for Video Object Segmentation Yuh Ren Choo, Pau-Choo Chung, Chich-Ling Huang and Jar-Ferr Yang, Department of Electrical Engineering National Cheng Kung University, Tachun Wang and Chen-Chiung Hsieh Institute for Information Industry, Taipei, Taiwan, R.O.C. Tainan, Taiwan, R.O.C. yrchoo@neural.ee.ncku.edu.tw Abstract Extracting semantic video object planes (VOP) is an important step for the success of MPEG-4. In video object plane segmentation, combining spatial and temporal information has been regarded as a promising approach for the segmentation of video object planes. This paper proposes a spatio-temporal algorithm for extracting the VOPs of image sequences. This method differs from traditional algorithm in that it is based on the combination of temporal edges and asymmetric fuzzy-C-mean on spatial region classification. With the proposed temporal edges, the generality of temporal information is retained while computation time is saved in contrast to motion vectors. The proposed spatial classification scheme, asymmetric fuzzy-C-mean, takes into account of the degrees of dispersion and orientations of pattern distributions. Thus, more accurate classification results can be obtained. Experiments have shown that the proposed spatio-temporal algorithm can effectively segment video object planes in a nearly static background, without affected by lighting sources and shadows. I. Introduction With the increasing popularity of multimedia applications, new coding techniques that allow variable bit-rate transmission and content-based interactivity are necessary. For this need, the standard MPEG-4 is therefore developed by encapsulating the concept of video object planes (VOPs) in order to provide region-based coding along with content-based interaction. Due to this reason, the decomposition of video images into VOPs has been considered one of the essential steps for the success of MPEG-4. Related literatures [1-6] on VOP segmentation were published. Some of these existing methods used watershed for region partition and mapped the regions onto subsequent frame [2][7], while some used jointed feature of intensity and motion vectors for dividing the image into disjointed regions. These approaches decomposed the sequence image into individual regions, rather than complete meaningful contents. Furthermore, these decomposition methods are easily affected by the inhomogeneity of the applied low-level features. On the other hand, motion fields are adopted as a basic feature for