IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. X, NO. X, 2008 1 Efﬁcient multi-target visual tracking using Random Finite Sets Emilio Maggio, Murtaza Taj, Andrea Cavallaro Abstract—We propose a ﬁltering framework for multi-target tracking that is based on the Probability Hypothesis Density (PHD) ﬁlter and data association using graph matching. This framework can be combined with any object detectors that generate positional and dimensional information of objects of interest. The PHD ﬁlter compensates for missing detections and removes noise and clutter. Moreover, this ﬁlter reduces the growth in complexity with the number of targets from exponential to linear by propagating the ﬁrst-order moment of the multi- target posterior, instead of the full posterior. In order to account for the nature of the PHD propagation, we propose a novel particle resampling strategy and we adapt the dynamic and observation models to cope with varying object scales. The proposed resampling strategy allows us to use the PHD ﬁlter when a priori knowledge of the scene is not available. Moreover, the dynamic and observation models are not limited to the PHD ﬁlter and can be applied to any Bayesian tracker that can handle State Dependent Variances (SDV). Extensive experimental results on a large standard video surveillance dataset using a standard evaluation protocol show that the proposed ﬁltering framework improves the accuracy of the tracker, especially in cluttered scenes. Index Terms—Video surveillance, clutter, tracking, multi- target, PHD ﬁlter, Monte Carlo methods. I. I NTRODUCTION The growth of adoption of video surveillance systems has been recently driven by hardware advances, such as camera miniaturization, digitization and increased availability of low- cost data storage. However, the opportunities offered by auto- mated video surveillance are not yet exploited due to the lack of accurate and efﬁcient algorithms for data-mining, content retrieval, event detection and behavior analysis. The extrac- tion of high-level information from surveillance video mainly relies on the analysis of lower level video data like objects and their trajectories, which are generated by multi-target trackers. While reliable tracking is possible under constrained conditions, the problem of tracking in a generic unconstrained scenario (for example in a dense scene with uncontrolled illumination) is still unsolved. The multi-target visual tracking problem can be decomposed into two main tasks, namely the detection of the objects of interests in each frame and the association of unique identities to the detections over time. The major challenge in the estimation of the number of targets and their position is that the estimate is based on a set of uncertain observations (i.e., E. Maggio, M. Taj and A. Cavallaro are with the Multimedia and Vision Group - Queen Mary, University of London, United Kingdom, E1 4NS, UK e-mail: {emilio.maggio, murtaza.taj, andrea.cavallaro}@elec.qmul.ac.uk. The authors acknowledge the support of the UK Engineering and Physical Sciences Research Council (EPSRC), under grant EP/D033772/1. the detections). A target may fail to generate an observation when occluded, an additional observation may be generated by clutter, and observations from actual targets may be corrupted by noise, thus affecting the state estimator. A multiple object tracker must also account for target interactions and for the time-varying number of targets in the scene by modeling their birth (when a new target appears in the scene or is a spawn from another target, such as a person stepping out of a car) and their death. Although the complete modeling of the multi- target problem is possible, its computational cost inevitably grows exponentially with the number of targets. A. Prior work Bayesian recursion is a popular approach to ﬁlter noisy observations in single-target tracking [1], [2], [3]. The Bayes ﬁlter ﬁrst predicts the target state based on a dynamical model and then updates the resulting density using the newly available observation. Two algorithms implementing this re- cursion are the Kalman Filter [4] and the Particle Filter (PF) [5]. Multi-target tracking requires the extension of these algorithms to cope with target birth and target death, clutter and missing observations (Tab. I). Although the multi-target state can be seen as a concatenation of single-target states, each modeled as a random variable [6], Bayes multi-target ﬁltering is computationally intensive due to the increase of the state dimensionality with the number of targets. To alleviate this problem several approaches have been proposed, as described below. One solution is to model the multi-target problem in the single-target state by propagating a mixture of single-target pdfs approximated by particles [7]. When a target appears in the scene, a new component of the mixture is initialized and then propagated independently. The birth event is governed by heuristics and it is not included in the ﬁltering framework. The volume of the multi-target state sampled by PF can be reduced by assuming that the targets do not appear simultaneously and by modeling the birth as a Poisson process [8]. To reduce the computational cost, Markov Chain Monte Carlo methods can be used to better sample the multi-target density [9]. Although the above-mentioned approaches make the multi- target problem tractable, they do not account for clutter and missing observations. An attempt to alleviate these limitations is presented in [10], but in this case the number of visible targets is assumed to be known and ﬁxed. Jump Markov Systems (JMS) approximated by PF have also been used to model the varying number of targets in the scene, clutter and missing detections [11], [12]. A JMS models the dependencies