A fast multi-scale covariance descriptor for object re-identiﬁcation Walid Ayedi a,b,⇑ , Hichem Snoussi a,1 , Mohamed Abid b,2 a Charles Delaunay Institute (FRE CNRS 2848), University of Technology of Troyes, 10010 Troyes, France b Sfax University, National Engineering School of Sfax, 3052 Sfax, Tunisia article info Article history: Available online 12 September 2011 Keywords: Real-life surveillance Object re-identiﬁcation Person re-identiﬁcation Multi-scale image description Region covariance abstract In many surveillance systems, there is a need to determine if a given object (person, group of persons, vehicle, ...) has already been observed over a network of cameras. It is the object re-identiﬁcation prob- lem. Solving this problem involves matching observation of objects across disjoint camera views. Uncal- ibrated ﬁxed or mobile cameras with non-overlapping ﬁeld of view generate uncontrolled variation in view point, background and lighting. In such situations, a robust and invariant image description is required. A multi-scale covariance image descriptor and a quadtree based scheme are proposed to describe any object of interest. We describe a fast method for computation of multi-scale covariance descriptor. The descriptor is evaluated in person re-identiﬁcation application using the VIPeR dataset. We show that the proposed multi-scale approach outperforms existing mono-scale image description methods. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Object re-identiﬁcation enables tracking the object through dif- ferent disjoint cameras views either on-the-ﬂy or retrospectively. This problem can be approached visual appearance matching. However, object appearance in disjoint camera views varies greatly due to variations in lighting conditions and changes in object ori- entation and object poses. Furthermore, uncalibrated and moving cameras make this task substantially harder still. In order to tackle this problem, two principal approaches are con- sidered: learning and non-learning approaches. Non-learning ap- proach focus on compiling image feature sets as a template to describe an object, followed by template matching using a direct dis- tance measure chosen independently from the data (Madden et al., 2007; Prosser et al., 2008). The learning approaches are based on training classiﬁers to select more discriminative features of the ob- ject between different views (Gary and Tao, 2008). Other learning approaches reformulate the person re-identiﬁcation problem as a ranking problem and learn a subspace where the potential true match is given highest ranking rather than any direct distance mea- sure (Zheng et al., 2010). These two approaches are commonly based on capturing rele- vant image features, either by distributing features in a discrimina- tive way or by selecting the most discriminative features. Images are composed of features and there is no particular scale or spatial frequency that has special status in natural scenes. Therefore a vi- sual system, whether natural or artiﬁcial, offer certain uniformity in representation of visual information over multiple scales. Fea- tures extraction from multi-scale image representation is success- fully used for robust object detection and recognition. In Stanley (2008), author shows that multi-scale features description pro- duces greater detection accuracy compared to mono-scale features description. Furthermore, in (Xiangxin et al., 2007) author proves that face recognition accuracy is also improved in the multi-scale domain. In this work, we use quadtree decomposition to extract multi-scale features. A combination of these features into a covari- ance matrix will be used to describe image regions. There are two main contributions within this paper. First, we propose a region covariance descriptor based on multi-scale features. A quadtree based scheme composed of a collection of multi-scale descriptors is used to describe a whole object. Secondly, we introduce a fast algorithm to generate multi-scale features and descriptors. The rest of the paper is structured as follows. First, we introduce the multi-scale feature structures. Then, the proposed multi-scale covariance descriptor is presented in Section 3 followed by its fast computing methodology. Finally, the detailed experimentation is presented in Section 4. The proposed descriptor is evaluated given a quadtree based matching technique. 2. Multi-scale feature description The pyramid of images is the simplest multi-scale representa- tion. It can be generated by a quadtree decomposition that gives more ﬂexibility to extract relevant multi-scale features. 0167-8655/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2011.09.006 ⇑ Corresponding author at: Charles Delaunay Institute (FRE CNRS 2848), Univer- sity of Technology of Troyes, 10010 Troyes, France. Tel.: +33 665 943 464/+216 23 378 379. E-mail addresses: ayediwalid@yahoo.fr (W. Ayedi), hichem.snoussi@utt.fr (H. Snoussi), mohamed.abid@rnu.enis.tn (M. Abid). 1 Tel.: +33 667 376 135. 2 Tel.: +216 97 588 722. Pattern Recognition Letters 33 (2012) 1902–1907 Contents lists available at SciVerse ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec