A general method for appearance-based people search based on textual queries Riccardo Satta, Giorgio Fumera, and Fabio Roli Dept. of Electrical and Electronic Engineering, University of Cagliari Piazza d’Armi, 09123 Cagliari, Italy {riccardo.satta,fumera,roli}@diee.unica.it Abstract. Person re-identification consists of recognising a person ap- pearing in different video sequences, using an image as a query. We pro- pose a general approach to extend appearance-based re-identification systems, enabling also textual queries describing clothing appearance (e.g., “person wearing a white shirt and checked blue shorts”). This functionality can be useful, e.g., in forensic video analysis, when textual descriptions of individuals of interest given by witnesses are available, in- stead of images. Our approach is based on turning any given appearance descriptor into a dissimilarity-based one. This allows us to build detec- tors of the clothing characteristics of interest using supervised classifiers trained in a dissimilarity space, independently on the original descrip- tor. Our approach is evaluated using the descriptors of three different re-identification methods, on a benchmark data set. 1 Introduction Person re-identification is a computer vision task for video-surveillance applica- tions. It consists of recognising a person appearing in different video sequences taken by one or more cameras, using an image as a query. Since the face region has usually a small size, and people are often not in frontal pose, face recogni- tion systems can not be applied. Thus, methods proposed so far exploit clothing appearance [1], or soft biometrics like gait [2]. In this paper we consider a sim- ilar task that we call “appearance-based people search”. It consists of finding, among a set of images of individuals, the ones relevant to a textual query describ- ing clothing appearance of an individual of interest. Thus, it differs from person re-identification, where the query is an image of the person of interest. This can be useful in applications like forensics video analysis, where a textual description of the individual of interest given by a witness can be available, instead of an image. To our knowledge, an analogous task (“person attribute search”) was con- sidered so far only in [3, 4]. In [3] the basic idea of building a specific detector for each attribute of interest (e.g., the presence of beard and eyeglasses, the dominant colour of torso and legs, etc.), was proposed, and a specific imple- mentation was developed, mainly for face attributes. The work in [4] focused on the following attributes: gender, hair/hat colour, clothing colour, and bag