A Multi-Object Particle Filter Tracking with a Dual Consistency Check: Application to Mid-Level Concept Detection in Videos Yifan Zhou, Boris Mansencal, Jenny Benois-Pineau Laboratoire Bordelais de Recherche en Informatique(LaBRI) CNRS (UMR 5800), Universit´ e Bordeaux 1 351, cours de la Lib´ eration, 33405, Talence cedex, France Abstract A novel mid-level video indexing method based on de- tection and tracking human faces is presented. Instead of detecting the faces on every frame, our method ﬁrst detects the faces and then tracks them. Compared to our previ- ous general-purpose tracking method, our approach is im- proved by: i) a Multi-Object model extension to track sev- eral objects in parallel; ii) a Dual Consistency Check by Kolmogrov-Smirnov test to alarm a scene change so as to stop the tracking and wait until the next detection; ii) ap- plication of temporal median ﬁltering of initial detection by Viola & Jones detector. The combination of ﬁltered detec- tion and our tracking method evaluated on an excerpt of TRECVID 2009 database increases the F-measure by 7% compared to Viola & Jones detector alone. 1. Introduction With the development of data storage capacities and computational power, multimedia databases become a real- ity. Consequently, various applications have been proposed on the different aspects of the multimedia: audio, video, image, text, etc. Finding the efﬁcient methods on content- based multimedia indexing and retrieval is among those ap- plications. In this paper, a mid-level video indexing method is presented. It consists in an automatic detection and track- ing of a mid-level concept in video, such as human faces. Despite we tackle the face detection and tracking problem, our method is applicable to any mid-level concept whose appearance model is consistent with that when we deﬁne it. The face detection has been one of the key research areas for many years [4, 12]. Since recently, the face has become one of the mid-level feature in content-based multimedia in- dexing and retrieval allowing for high-level concept detec- tion. We can in particular cite the “classroom” or “demon- stration & protest” high-level concepts in TRECVID 2009 competition [7, 8], for which the face as a mid-level fea- ture is helpful. One of the most popular face detection methods is Viola & Jones detector implemented in OpenCV [11, 1]. It is a cascade of boosted classiﬁers based on haar- like features. Lienhart et al. improved this method by adding center-surround features and more edge and line fea- tures [5]. The experimental results, including our previous work for TRECVID 2008, show that this method works well when the faces are at either frontal or proﬁle pose according to the camera view. However, when the faces change pose, the performance of OpenCV detector strongly decreases. This is due to the limitation that the cascade classiﬁers are often trained either based on frontal or proﬁle view. Be- sides, the method is not stable in the case of illumination changes. This is how we come to the idea of combin- ing the method [11] with our Particle Filter (PF) tracking method[13, 14]. The complete method we present in this paper was developed for IRIM consortium participation in TRECVID 2009 [7, 8]. A Multi-Resolution Particle Filter Tracking with a Con- sistency Check for Model Update (MRPF) [13, 14] that we proposed ﬁts to our problem: to detect a semantic concept ﬁrst and then track it based on its appearance model. The object states are updated by the time in order to adapt their appearance to a model change, hence capturing progres- sive pose and lighting variations and occlusions. Indeed, the technique of model update we developed corresponds perfectly to the requirement of face detection whatever the pose is, subject to the condition that it was detected at least once. Moreover, the approach of consistency check based on Kolmogrov-Smirnov (KS) test can control the situation of incoherent face estimate between two successive frames. This means that if the estimates of the appearance state of the tracked face in the two frames differ a lot from each other, our system can reinitialize the object estimate by cor- recting its motion state. The principles of our method are as follows. The face initial states are ﬁrst detected by OpenCV detector. They are then tracked on the following frames based on MRPF. In addition to these two basic techniques, the most impor- tant improvements compared to our previous work on track- ing in this paper are: i) a Multi-Object model to track sev- 978-1-4244-8027-2/10/$26.00 c 2010 IEEE CBMI’2010