PhD Forum: Multi-view occupancy maps using a network of low resolution visual sensors Sebastian Gruenwedel, Vedran Jelaca, Peter Van Hese, Richard Kleihorst and Wilfried Philips Ghent University TELIN-IPI-IBBT Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium sebastian.gruenwedel@telin.ugent.be Abstract—An occupancy map provides an abstract top view of a scene and can be used for many applications such as domotics, surveillance, elderly-care and video teleconferencing. Such maps can be accurately estimated from multiple camera views. However, using a network of regular high resolution cameras makes the system expensive, and quickly raises privacy concerns (e.g. in elderly homes). Furthermore, their power consumption makes battery operation difficult. A solution could be the use of a network of low resolution visual sensors, but their limited resolution could degrade the accuracy of the maps. In this paper we used simulations to determine the minimum required resolution needed for deriving accurate occupancy maps which were then used to track people. Multi-view occupancy maps were computed from foreground silhouettes derived via an analysis of moving edges. Ground occupancies computed from each view were fused in a Dempster-Shafer framework. Tracking was done via a Bayes filter using the occupancy map per time instance as measurement. We found that for a room of 8.8 by 9.2 m, 4 cameras with a resolution as low as 64 by 48 pixels was sufficient to estimate accurate occupancy maps and track up to 4 people. These findings indicate that it is possible to use low resolution visual sensors to build a cheap, power efficient and privacy- friendly system for occupancy monitoring. I. I NTRODUCTION Occupancy maps are an important step in many applications and are used for monitoring activities of people (for instance, how many people are in a room, the whereabouts of these people, etc.). Such maps can be accurately estimated using a distributed camera network over a single viewpoint setup. However, next to the arising privacy issues, regular high- resolution cameras, which are usually used in such camera networks, make these systems expensive. Their high power consumption precludes battery usage, requiring more energy- efficient solutions. One possibility is the use of low resolution visual sensor networks (e.g. mouse sensors) [1], [2], but their limited resolution could degrade the accuracy of occupancy maps. It is also not clear which resolution is sufficient to construct accurate occupancy maps, that can be used for further processing. In this paper, we simulate a visual sensor network to determine the minimal required resolution needed to construct these maps. To do so, we used a regular camera network and resized the image to simulate low resolution sensors (Fig. 1). In Section II we describe our data set which we used to perform simulations, followed by the architecture used to obtain the measures to determine the minimal resolution (Section III). Finally, we summarize our results in section IV. (a) (b) (c) (d) Fig. 1. Input images were resized for different resolutions (frame 530, camera 3): (a) 256x190, (b) 128x96, (c) 64x48 and (d) 32x24 pixels. II. DATA For the simulation of low resolution visual sensors, we used a camera network in an 8.8m by 9.2m room. The dataset contains four people walking around the room observed by four cameras (780x580 pixels at 20 FPS) with overlapping views. Recordings were taken for about one minute during which ground truth positions of each person were annotated at one second intervals. These ground truth positions were used to measure the performance of our occupancy mapping and tracking for different image resolutions. III. METHODS A. Foreground detection using moving edges To perform foreground/background segmentation, we used a method to detect moving edges via analysis of the image gradient. The method uses edge dependencies as statistical features of foreground and background regions and defines foreground as regions containing moving edges. The back- ground is described by a short- and long-term image gradient model using recursive smoothing for updating. The foreground mask (silhouettes of moving people) is obtained by clustering the moving edges and combining them via a convex hull technique (figure 2a-2d). B. Dempster-Shafer based multi-view occupancy maps The approach we followed (as described in [3]) constructs an occupancy map based on Dempster-Shafer reasoning [4],