Multi-view video segmentation and tracking for video surveillance Gelareh Mohammadi, Frederic Dufaux ∗ , Thien Ha Minh, Touradj Ebrahimi Multimedia Signal Processing Group Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne, Switzerland ABSTRACT Tracking moving objects is a critical step for smart video surveillance systems. Despite the complexity increase, multiple camera systems exhibit the undoubted advantages of covering wide areas and handling the occurrence of occlusions by exploiting the different viewpoints. The technical problems in multiple camera systems are several: installation, calibration, objects matching, switching, data fusion, and occlusion handling. In this paper, we address the issue of tracking moving objects in an environment covered by multiple un-calibrated cameras with overlapping fields of view, typical of most surveillance setups. Our main objective is to create a framework that can be used to integrate object- tracking information from multiple video sources. Basically, the proposed technique consists of the following steps. We first perform a single-view tracking algorithm on each camera view, and then apply a consistent object labeling algorithm on all views. In the next step, we verify objects in each view separately for inconsistencies. Correspondent objects are extracted through a Homography transform from one view to the other and vice versa. Having found the correspondent objects of different views, we partition each object into homogeneous regions. In the last step, we apply the Homography transform to find the region map of first view in the second view and vice versa. For each region (in the main frame and mapped frame) a set of descriptors are extracted to find the best match between two views based on region descriptors similarity. This method is able to deal with multiple objects. Track management issues such as occlusion, appearance and disappearance of objects are resolved using information from all views. This method is capable of tracking rigid and deformable objects and this versatility lets it to be suitable for different application scenarios. Keywords: Multi-view, Object Tracking, Video Surveillance, Homography Transform 1. INTRODUCTION Tracking moving objects is a key problem in computer vision and image processing. It is important in a wide variety of applications, like three-dimension (3D) broadcasting, virtual reality, special effects, image composition, human computer interaction (HCI), video surveillance, human motion analysis and traffic monitoring. Automatically monitoring people in crowded environments such as metro stations, city markets, or public parks, has nowadays become feasible for many reasons. First, from the accuracy’s point of view, human operators are likely to fail in monitoring crowded and cluttered environments through tens of cameras. Automatic techniques have reached a degree of maturity to be employed at least as a first automatic step to alert human operators, reducing their effort and the sources of distraction. Second, from the economical point of view, the cost of mounting cameras and developing automatic solutions has declined in comparison to the cost of hiring human operators to watch them. Despite of the complexity increase, multiple camera systems exhibit the undoubted advantages of covering wide areas and enhancing the management of occlusions by exploiting the different viewpoints. Single camera tracking is limited in scope of its applications. While suited for certain applications like local environments, even simple surveillance applications demand the use of multiple cameras for two reasons. Firstly, it is not possible for one camera to provide adequate coverage of the environment because of limited field of view (FOV). Secondly, it is desirable to have multiple cameras observing critical areas, to provide robustness against occlusion. Multiple-cameras provide us with more complete history of an object’s actions in an environment. To take advantage of additional cameras, it is necessary to establish correspondence between different views. Thus, we see a parallel between ∗ frederic.dufaux@epfl.ch