A mutual information based face clustering algorithm for movie content analysis ☆ N. Vretos ⁎, V. Solachidis 1 , I. Pitas 1 Department of Informatics, University of Thessaloniki, Thessaloniki 54124, Greece abstract article info Article history: Received 13 December 2010 Received in revised form 15 July 2011 Accepted 29 July 2011 Keywords: Face clustering Mutual information Normalized cuts Spectral graph analysis Image processing This paper investigates facial image clustering, primarily for movie video content analysis with respect to actor appearance. Our aim is to use novel formulation of the mutual information as a facial image similarity criterion and, by using spectral graph analysis, to cluster a similarity matrix containing the mutual information of facial images. To this end, we use the HSV color space of a facial image (more precisely, only the hue and saturation channels) in order to calculate the mutual information similarity matrix of a set of facial images. We make full use of the similarity matrix symmetries, so as to lower the computational complexity of the new mutual information calculation. We assign each row of this matrix as feature vector describing a facial image for producing a global similarity criterion for face clustering. In order to test our proposed method, we conducted two sets of experiments that have produced clustering accuracy of more than 80%. We also compared our algorithm with other clustering approaches, such as the k-means and fuzzy c-means (FCM) algorithms. Finally, in order to provide a baseline comparison for our approach, we compared the proposed global similarity measure with another one recently reported in the literature. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Face clustering is a very important task for movie semantic ex- traction. It can contribute in many ways, like determining the principal actors or the creation of database references or dialog detection and many others. Moreover, face clustering can be used for unsupervised training of face recognition algorithms and in general as a preprocessing step in any human based image and video processing tasks, so as to create a human wise categorization of the data. Facial image clustering, put together facial images that belong to the same person by employing a certain image similarity criterion. Let P be a set of facial images. A clustering C ={C i |C i pP} is a division of P into facial image clusters C i , for which the following conditions hold: ∪ C i ∈ C C i =P and ∀ C i , C j ∈ C : C i ∩ C j ≠i =∅. Ideally, the clustered facial images should belong to the same person. Face clustering is a very important application and can contribute in many ways to semantic movie analysis, e.g., for determining the movie cast or for assisting automatic dialog detection. Until now, few face clustering algorithms have been reported in the literature [1–4]. Face recognition and face clustering are two different tasks: in face recognition, we assume that we have a known number of persons and a training facial image database, consisting of certain labeled facial images per person. This database is used for training a face recognition classiﬁer. Then, if we have a test video, each facial image extracted from a video frame can be tested by the already trained face recognition classiﬁer and the best matching person id (or rather a list of best matching people ids) is returned. In face clustering, the number of persons appearing in a video clip or movie is unknown and there is no training facial image database. Therefore, no training is possible. The face clustering goal is entirely different from that of face recognition: given a number of video frames containing facial images, we have to ﬁnd the unknown number of persons appearing therein, based on facial image similarities. Both face recognition and face clustering may share certain tools (e.g. image similarity measures, face representation methods), but are different in many aspects in terms of goals, methodology (training/no training) and performance metrics. Although, a great amount of work has been conducted on face recognition, face clustering is a rather novel topic with few publications in the literature so far [1–4]. In [2] the authors have proposed an approach for face clustering in video that involves the so called Joint Manifold Distance (JMD). Therein, the authors propose a method, where each subspace represents a set of facial images of the same person detected in consecutive frames. The clustering algo- rithm, uses a facial video sequence to sequence distance and follows an agglomerative strategy. Another distance metric for clustering and classiﬁcation algorithms, called Afﬁne Invariant Distance Measure (AIDM) was proposed in [3]. This distance function, which is invariant to afﬁne transformations, is used in combination with partitioning- based algorithms for face clustering. In [4], Foucher et al. recom- mended a face clustering method based on face detection and tracking Image and Vision Computing 29 (2011) 693–705 ☆ This paper has been recommended for acceptance by Ioannis A. Kakadiaris. ⁎ Corresponding author. Tel./fax: + 30 2310996304. E-mail addresses: vretos@aiia.csd.auth.gr (N. Vretos), pitas@aiia.csd.auth.gr (I. Pitas). 1 Tel./fax: + 30 2310996304. Contents lists available at ScienceDirect Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis 0262-8856/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2011.07.006