Tracking Across Multiple Cameras With Disjoint Views Omar Javed Zeeshan Rasheed Khurram Shaﬁque Mubarak Shah Computer Vision Lab University of Central Florida {ojaved,zrasheed,khurram,shah}@cs.ucf.edu Abstract Conventional tracking approaches assume proximity in space, time and appearance of objects in successive obser- vations. However, observations of objects are often widely separated in time and space when viewed from multiple non-overlapping cameras. To address this problem, we present a novel approach for establishing object correspon- dence across non-overlapping cameras. Our multi-camera tracking algorithm exploits the redundance in paths that people and cars tend to follow, e.g. roads, walk-ways or corridors, by using motion trends and appearance of ob- jects, to establish correspondence. Our system does not require any inter-camera calibration, instead the system learns the camera topology and path probabilities of ob- jects using Parzen windows, during a training phase. Once the training is complete, correspondences are assigned us- ing the maximum a posteriori (MAP) estimation framework. The learned parameters are updated with changing trajec- tory patterns. Experiments with real world videos are re- ported, which validate the proposed approach. 1. Introduction Surveillance of wide areas requires a network of cameras. It is not always possible to have overlapping camera views in this case. The observations of the same object can be widely separated in time and space in such a scenario. Moreover, it is preferable that the tracking system does not require cam- era calibration or complete site modelling, since the lux- ury of calibrated cameras or site models is not available in most situations. In this paper, we focus on the problem of multi-camera tracking in a system of non-overlapping un- calibrated cameras. The task of a multi-camera tracker is to establish correspondence between observations of objects across cameras. We assume that tracking information is available for individual cameras, and the objective is to ﬁnd correspondences between these tracks, in different cameras, such that the corresponded tracks belong to the same object in the real world. We use the observations of people through the system of cameras to discover the relationships between the cam- eras. For example, suppose two cameras A and B are suc- cessively arranged alongside a walkway. Suppose people moving along one direction of the walkway that are initially observed in camera A are also observed entering camera B after a certain time interval. However, people moving in op- posite direction in camera A might not later be observed in camera B. Thus, the usual locations of exits and entrances between cameras, direction of movement and the average time taken to reach from A to B can be used to constrain correspondences. In this paper, we refer to these cues as space-time cues. Another cue for tracking is the appearance of persons as they move through cameras. We present a MAP estimation framework to use these cues in a principled manner for tracking. We use Parzen windows, also known as kernel density estimators, to estimate the inter-camera space-time probabilities from the training data, i.e., proba- bility of an object entering a certain camera at a certain time given the location, time and velocity of its exit from other cameras. Using Parzen windows lets the data ‘speak for it- self’ ([13]) rather than imposing assumptions. The change in appearance as a person moves between certain cameras is modelled using the distances between color models. The correspondence probability, i.e. the probability that two observations are of the same object, depends on both the space-time information and the appearance. Tracks are as- signed by estimating the correspondences, which maximize the posterior probabilities. This is achieved by transforming the MAP estimation problem into a problem of ﬁnding the path cover of a directed graph for which an efﬁcient optimal solution exists. The paper is organized as follows: We give an overview of the related research in Section 2. A Bayesian formula- tion of the problem is presented in Section 3. The learning of path and appearance probabilities is discussed in Section 4. A method to ﬁnd correspondences that maximizes the a posteriori probabilities is given in Section 5. The proce- dure to update the probabilistic models is given in Section 6. Results are presented in Section 7. 2. Related Work A large amount of work on multi-camera surveillance as- sumes overlapping views. Jain and Wakimoto [9] used cal- 1