Deriving Implicit Indoor Scene Structure With Path Analysis Xu Lu Caixia Wang Nader Karamzadeh Arie Croitoru Anthony Stefanidis Center for Geospatial Intelligence Dept. of Geography and Geoinformation Science George Mason University 4400 University Drive, MS 6C3; Fairfax, VA 22030 +1-703-993-9237 {xlu5; cwangg; nshahnik; acroitor; astefani}@gmu.edu ABSTRACT Indoor video surveillance is now widely used in government, public, and private facilities. While the capacity to generate such video data is increasing, our ability to derive a coherent scene understanding of the structure of the scene and how it is being utilized, using only motion data, is still lagging behind. This paper proposes a framework for deriving indoor scene structure identifying abnormal motion behavior using only video tracking data, and without requiring a floor plan. The proposed framework, which is data-driven, is based on four sequential processing steps, namely detection of entrance and exit points, the analysis of the connectivity between entrance and exit points, the extraction of mean paths and motion corridors, and the statistical analysis of the length and velocity parameters of motion for the detection of abnormal motion behavior. The paper outlines the proposed framework and demonstrates its implementation using a real- world data set comprising 1138 trajectories. Categories and Subject Descriptors I.2.10 [Artificial Intelligence]: Vision and Scene Understanding –Motion, Modeling and Recovery of Physical Attributes, Video Analysis, Representation, data structures, and transforms. General Terms Algorithms, Measurement, Experimentation, Security, Human Factors. Keywords Video Surveillance, Tracking, Scene analysis, Motion corridor, Abnormal behavior. 1. INTRODUCTION Video surveillance and monitoring cameras have become commonplace in indoor facilities – from hospitals, schools, banks, and shopping malls to airports and transit centers, government facilities, and military installations. For example, it is estimated that in Chicago (Illinois, USA) approximately 10,000 video cameras – both publicly and privately owned – have been deployed; many of which are monitoring indoor spaces [5]. In the UK alone, it is estimated that between 2 and 4 million CCTVs are deployed, with over 500,000 of them operating in London [1][18]. This number is expected to increase further as the demand for tighter security and better safety grows, and as commercial and residential facilities continue to enhance their video surveillance and monitoring systems. In conjunction, due to recent technological advances in video sensors (e.g. CMOS and CCD), networking (in particular wireless) and compression technologies (e.g., MPEG4), off-the-shelf surveillance systems have now become affordable, easily deployable solutions for the general public. However, these trends have not been matched by a comparable capacity to automatically process video datasets and extract meaningful knowledge from them. To a large extent, video analysis is still carried out manually by human analysts who rely heavily on their close familiarity with the monitored scene, the human visual system, and expert domain knowledge. It should be noted that while the role of familiarity with a scene is still not fully understood in tasks such as visual search, it is clear that in some situations such familiarity is essential in limiting the search space, thus making the process more effective [11]. This paper focuses on the development of a data- and activity-driven analysis framework that will assist human analysts in quickly developing familiarity with a previously unknown monitored indoor scene. In particular, following [22], our interest is in developing an activity-driven approach for constructing indoor scene models that are driven primarily by functionality, describing the way in which a physical space is utilized (e.g. a popular path between two doors) rather than structural constraints of a scene (e.g. an indoor space that includes rooms, doors, and corridors). Furthermore, we assume that a priori knowledge of a floor plan is not readily available; therefore the only available cue for deriving a scene model is how people move in the space (while floor plans can be useful for our analysis, it can not be assumed that such plans are always available and accessible, e.g. in the case of an ad-hoc deployment of a surveillance system for rapid response). This approach allows us to relax the requirement for a priori knowledge of structural details (e.g. a floor plan), and is particularly suitable for open indoor scenes where structural limitations do not exist or do not have a significant impact on the way people traverse space. Such scenes are very different from typical outdoor scenes, where activity is largely driven by physical scene structures (e.g. roads, sidewalks or trails). For example, as cars are restricted only to road lanes, activity in the scene is physically constrained by the presence of a road. In open-space indoor scene analysis, on the other hand, activities reflect functionality. Learning and analyzing patterns emerging from the way by which people traverse the space enables to identify motion corridors. Similarly to car lanes, we envision such corridors as commonly used paths formed by people moving from Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISA ’11, 1 November 2011, Chicago, IL, USA Copyright 2011 ACM 978-1-4503-1035-2 ...$10.00. 43