A mutual information based face clustering algorithm for movie content analysis
☆
N. Vretos ⁎, V. Solachidis
1
, I. Pitas
1
Department of Informatics, University of Thessaloniki, Thessaloniki 54124, Greece
abstract article info
Article history:
Received 13 December 2010
Received in revised form 15 July 2011
Accepted 29 July 2011
Keywords:
Face clustering
Mutual information
Normalized cuts
Spectral graph analysis
Image processing
This paper investigates facial image clustering, primarily for movie video content analysis with respect to
actor appearance. Our aim is to use novel formulation of the mutual information as a facial image similarity
criterion and, by using spectral graph analysis, to cluster a similarity matrix containing the mutual
information of facial images. To this end, we use the HSV color space of a facial image (more precisely, only the
hue and saturation channels) in order to calculate the mutual information similarity matrix of a set of facial
images. We make full use of the similarity matrix symmetries, so as to lower the computational complexity of
the new mutual information calculation. We assign each row of this matrix as feature vector describing a facial
image for producing a global similarity criterion for face clustering. In order to test our proposed method, we
conducted two sets of experiments that have produced clustering accuracy of more than 80%. We also
compared our algorithm with other clustering approaches, such as the k-means and fuzzy c-means (FCM)
algorithms. Finally, in order to provide a baseline comparison for our approach, we compared the proposed
global similarity measure with another one recently reported in the literature.
© 2011 Elsevier B.V. All rights reserved.
1. Introduction
Face clustering is a very important task for movie semantic ex-
traction. It can contribute in many ways, like determining the
principal actors or the creation of database references or dialog
detection and many others. Moreover, face clustering can be used for
unsupervised training of face recognition algorithms and in general as
a preprocessing step in any human based image and video processing
tasks, so as to create a human wise categorization of the data.
Facial image clustering, put together facial images that belong to
the same person by employing a certain image similarity criterion. Let
P be a set of facial images. A clustering C ={C
i
|C
i
pP} is a division of P
into facial image clusters C
i
, for which the following conditions hold:
∪
C
i
∈ C
C
i
=P and ∀ C
i
, C
j
∈ C : C
i
∩ C
j ≠i
=∅. Ideally, the clustered facial
images should belong to the same person. Face clustering is a very
important application and can contribute in many ways to semantic
movie analysis, e.g., for determining the movie cast or for assisting
automatic dialog detection. Until now, few face clustering algorithms
have been reported in the literature [1–4].
Face recognition and face clustering are two different tasks: in face
recognition, we assume that we have a known number of persons and
a training facial image database, consisting of certain labeled facial
images per person. This database is used for training a face recognition
classifier. Then, if we have a test video, each facial image extracted
from a video frame can be tested by the already trained face
recognition classifier and the best matching person id (or rather a
list of best matching people ids) is returned. In face clustering, the
number of persons appearing in a video clip or movie is unknown and
there is no training facial image database. Therefore, no training is
possible. The face clustering goal is entirely different from that of face
recognition: given a number of video frames containing facial images,
we have to find the unknown number of persons appearing therein,
based on facial image similarities. Both face recognition and face
clustering may share certain tools (e.g. image similarity measures,
face representation methods), but are different in many aspects in
terms of goals, methodology (training/no training) and performance
metrics. Although, a great amount of work has been conducted on face
recognition, face clustering is a rather novel topic with few
publications in the literature so far [1–4]. In [2] the authors have
proposed an approach for face clustering in video that involves the so
called Joint Manifold Distance (JMD). Therein, the authors propose a
method, where each subspace represents a set of facial images of the
same person detected in consecutive frames. The clustering algo-
rithm, uses a facial video sequence to sequence distance and follows
an agglomerative strategy. Another distance metric for clustering and
classification algorithms, called Affine Invariant Distance Measure
(AIDM) was proposed in [3]. This distance function, which is invariant
to affine transformations, is used in combination with partitioning-
based algorithms for face clustering. In [4], Foucher et al. recom-
mended a face clustering method based on face detection and tracking
Image and Vision Computing 29 (2011) 693–705
☆ This paper has been recommended for acceptance by Ioannis A. Kakadiaris.
⁎ Corresponding author. Tel./fax: + 30 2310996304.
E-mail addresses: vretos@aiia.csd.auth.gr (N. Vretos), pitas@aiia.csd.auth.gr
(I. Pitas).
1
Tel./fax: + 30 2310996304.
Contents lists available at ScienceDirect
Image and Vision Computing
journal homepage: www.elsevier.com/locate/imavis
0262-8856/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2011.07.006