Particle Filtering with Factorized Likelihoods for Tracking Facial Features I. Patras and M. Pantic Knowledge and Data Engineering Group Delft University of Technology Mekelweg 4, 2628 CD Delft The Netherlands Abstract In the recent years particle ﬁltering has been the dominant paradigm for tracking facial and body features, recogniz- ing temporal events and reasoning in uncertainty. A major problem associated with it is that its performance deterio- rates drastically when the dimensionality of the state space is high. In this paper, we address this problem when the state space can be partitioned in groups of random vari- ables whose likelihood can be independently evaluated. We introduce a novel proposal density which is the product of the marginal posteriors of the groups of random variables. The proposed method requires only that the interdependen- cies between the groups of random variables (i.e. the pri- ors) can be evaluated and not that a sample can be drawn from them. We adapt our scheme to the problem of multiple template-based tracking of facial features. We propose a color-based observation model that is invariant to changes in illumination intensity. We experimentally show that our algorithm clearly outperforms multiple independent tem- plate tracking schemes and auxiliary particle ﬁltering that utilizes priors. 1. Introduction In the recent years, particle ﬁltering has been the dominant paradigm [2] [3] [8] [5] [4] [7] [11] in the tracking of the state of a temporal event given a set of noisy observations up to the current time instant. Its ability to maintain simultaneously multiple solutions, the so called particles, make it particularly attractive when the noise in the observations is not Gaussian and robust to missing or inaccurate data. However, a problem that has been reported in this framework [1] [9] is that the performance deterio- rates drastically as the dimensionality of the state space (i.e. ) increases. Indeed, as the dimensionality of the state space increases, a large number of particles that are propagated from the previous time instance are wasted in areas where the likelihood of the observations is very low. Therefore, a very large number of particles are necessary to accurately track the state. In this paper we propose a method that deals with the above mentioned problem in the case that the state can be partitioned in groups of random variables (i.e. ), such that the likelihood of the observations at the current time instant, given each group , can be independently evaluated. We build on the particle ﬁlter- ing framework, which involves the following three steps: a) sample from , where is the state at the previ- ous time instant, b) propagate the samples via the transition probability and c) evaluate a new weight for the samples from the likelihood . We propose a modiﬁed scheme which can be summarized as follows. First, each partition is propagated and evaluated independently. This creates a particle-based representation of . We subsequently use this representations to sample from a pro- posal function . Finally, each of the particles produced in this way is reweighted by evaluating the transition probability so that the set of particles with their new weights represents the a posteriori probabil- ity . In correspondence to the standard particle ﬁlter- ing, our approach requires only that the transition probabil- ity can be evaluated and not that it can be sampled from. Thus, it allows easier modeling of the interdependen- cies between the groups of random variables (for exam- ple with a Markov Random Field). Furthermore, since the particles are sampled from the proposal function , it is guaranteed the likelihood is not low and, therefore, that the particles are not wasted at areas of the state space with low likelihood. We experimentally verify our claims by applying the proposed method to the problem of multiple template-based tracking of facial features. We propose a color-based ob- servation model that is invariant to changes in illumination intensity and utilize learned priors of the relative conﬁgu- rations of the facial features. We provide comparative ex- perimental results with other particle ﬁlters on real image sequences. The remainder of the paper is organized as follows. In Section 2 we concisely review similar works and describe the proposed particle ﬁltering method in detail. In Section 1 Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR’04) 0-7695-2122-3/04 $ 20.00 © 2004 IEEE