to appear in International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS’04), April 21-23, 2004, Lisboa, Portugal.1 IMPACT OF TOPOLOGY CHANGES IN VIDEO SEGMENTATION EVALUATION Elisa Drelie Gelasca, Touradj Ebrahimi Signal Processing Institute Swiss Federal Institute of Technology EPFL CH-1015 Lausanne, Switzerland Myl ` ene C. Q. Farias, Sanjit K. Mitra Department of Electrical Engineering University of California Santa Barbara Santa Barbara, CA 93106, USA ABSTRACT This work addresses the problem of studying and characterizing topology changes between resulting and reference segmentation masks in video sequences. In particular, the goal of this paper is to examine the impact of individual and combined artifacts found in video object segmentation applications (e.g., added regions and holes). Added regions and holes artifacts are synthetically gener- ated and inserted in a segmentation mask. We performed a psy- chophysical experiment in which human subjects were asked to rate the annoyance of the generated artifacts when presented alone or in combination. The results show how individual objective met- rics can be derived and how an overall objective metric can be pre- dicted by linearly combining individual segmentation errors for a specific video content. 1. INTRODUCTION Applications such as object-based coding, video databases, inter- active video and remote surveillance are based on a representa- tion of the video content in terms of video objects. The first step of object-based applications is the identification of the areas of a video sequence that correspond to meaningful regions i.e., objects. This step is generally performed by a segmentation algorithm. During the past three decades, different video segmentation techniques have been proposed to extract the objects of interest from a video sequence. However, no single segmentation tech- nique is universally useful for all applications and different tech- niques are not equally suited for a particular task. In recent years, in order to properly evaluate the performance of segmentation tech- niques, objective metrics have been proposed [1], [2], [3], [4], [5]. To validate an objective metric, subjective experiments need to be performed. Subjective experiments are also used as a research tool to better understand how humans perceive artifacts and judge quality. On the basis of the analysis of the subjective data, topol- ogy changes between a reference and the resulting segmentation mask (or segmentation artifacts) can be characterized and a more reliable objective metric can be developed. With this purpose, in this paper we present an analysis of two artifacts produced by typ- ical segmentation algorithms (i.e., added regions and holes). A subjective test has been carried out to measure the annoyance of these artifacts when presented alone or in combination. We stud- ied the different levels of annoyance produced by these artifacts at different sizes and by their combinations. The idea is to develop individual metrics for the most relevant artifacts and to combine them into an overall quality metric towards a perceptually driven segmentation evaluation metric. C result segmentation R R A R C H C reference segmentation hole added region Fig. 1. Reference segmentation R overlapped to the resulting seg- mentation C. Spatial artifacts under investigation are depicted. In this paper, we also present an experimental method for sub- jective evaluation of segmented video sequences. The task of defin- ing a formal method for subjective tests in video object segmen- tation quality assessment is very useful, since to the best of our knowledge, only informal tests have been performed [3], [4]. The paper is organized as follows. The description of syn- thetic artifacts and the test sequences generated for the subjective experiments are presented in Section 2. The experimental method is described in Section 3. Subjective results are analyzed in Sec- tion 4. Finally, Section 5 draws the conclusions. 2. GENERATION OF SYNTHETIC ARTIFACTS To determine the topology changes between a reference and a re- sulting segmentation mask, the difference between the two seg- mentation masks has to be computed. These changes (segmenta- tion artifacts) can affect the quality of a segmented video in two ways: statically (spatially) and dynamically (temporally). In this work, we concentrated on the annoyance of two kinds of spatial artifacts: added regions and holes that are among the spatial arti- facts typically introduced by the most common segmentation algo- rithms. Segmentation artifacts are defined by the amount of mis- segmented pixels (or pixel errors) present in the resulting segmen- tation mask. An algorithm for object segmentation can in princi- ple be evaluated by estimating only these pixel errors [1], [3], [4] and [5]. In this paper, we focused on segmentation of moving objects in video sequences. An object is a semantically meaningful region. Let us define R as the set of all the objects belonging to the refer- ence segmentation mask. Similarly, C is defined as the set of all the objects and regions in the resulting segmentation mask. Pixels in the resulting segmentation mask C which do not be- long to the reference segmentation mask R are defined as false