A Distributed System for Supporting Spatio-temporal Analysis on Large-scale Camera Networks Kirak Hong * , Marco Voelz † , Venu Govindaraju ‡ , Bharat Jayaraman ‡ , and Umakishore Ramachandran * * Georgia Institute of Technology {khong9, rama}@cc.gatech.edu † University of Stuttgart marco.voelz@ipvs.uni-stuttgart.de † SUNY Buffalo {govind, bharat}@buffalo.edu Abstract—Cameras are becoming ubiquitous. Technological advances and the low cost of such sensors enable deployment of large-scale camera networks in metropolises such as London and New York. Applications such as video-base surveillance and emergency response that exploit such camera networks are continuous, data intensive, and dynamic in terms of resource requirements. Common anomalies in such application spaces in- clude authorized personnel moving into unauthorized spaces and checking the movement of suspicious individuals as they move through the spaces. High level goal in such applications include catching such anomalies in real time and reducing collateral damage. A well-known technique for meeting this high level goal is spatiotemporal analysis. This is an inferencing technique employed by domain experts (e.g., vision researchers) to answer queries such as show the track of person A in the last 30 minutes. Performing spatio-temporal analysis in real-time for a large- scale camera network is challenging. It involves continuously capturing images from distributed cameras, analyzing the images to detect and track objects of interest in the ﬁeld of view of the cameras, generating an event by comparing the signature of a detected object against a database of known signatures, and maintaining a state transition table indexed by time that shows the spatio-temporal evolution of people movement through the distributed spaces. In this paper, we propose a distributed system architecture to address these challenges. We make the following contributions: (a) present the design choices for real-time spatio- temporal analysis with a view to supporting scalability (in terms of number of cameras, event rate, and known targets), (b) develop heuristics for pruning the event generation phase of spatio- temporal analysis, and (c) implement and evaluate the different design choices in a distributed system to show the scalability of our distributed system architecture. I. I NTRODUCTION As sensors for recognizing humans, such as cameras, voice recognition sensors, and RFID readers, are becoming more capable and widely deployed, new application scenarios arise, requiring an automated processing of the continuous stream data to identify and track human beings in real-time. Scenarios in this ﬁeld include airport security, emergency response and assisted living, all requiring real time detection of unusual situations, called anomalies. Different from techniques such as RFID badges, cameras allow for an unobtrusive way of identifying people’s whereabouts, making them the primary source of information in many of these scenarios. Take an airport scenario as an example: Amsterdam’s Schiphol airport currently has 1,000 cameras in place and plans to increase that number to between 3,000 and 4,000 over the next few years [1]. In an airport, a common security violation is that an individual enters into a restricted area without permission. If such a situation happens, the individual should be reported to an airport security team in real time, preventing potential threats to the airport. Similarly, any in- dividuals who checked in their baggage but did not board their airplane or unattended baggages are other examples of anomalous situations in an airport. The high level goal in such applications, often referred to as situation awareness applications [2], is catching anomalies in real time and reducing collateral damage. To achieve this, there is a well-known technique called spatio-temporal analysis, enabling an application to answer queries such as “Where is person A?”, “When and where did person A leave zone X?”, “When and where did person A and person B meet for the last time?”. Applications providing the means to answer these queries usually employ distributed cameras and sensors of other modalities (such as audio and biometrics) to detect people in the observed system. These live sensor streams are used to make an estimation about the identity of the detected people, comparing the data to a set of well-known identities. These estimates generated throughout the system are gathered and regularly consolidated to create a global view of the observed area, e.g., by recording the most likely whereabouts of each person known to the system at a certain point of time. The current global state and possibly a history of former states enable the system to answer queries such as stated above. Recently, Menon et al. [3] showed the feasibility of spatio- temporal analysis using this concept by maintaining the global state in a transition table similar to hidden markov models. The table represents the probabilities of each person known to the system being in each of the observed locations. Events,