Facial Component-Landmark Detection B.A. Efraty, M. Papadakis, A. Proﬁtt, S. Shah and I.A. Kakadiaris Abstract— Landmark detection has proven to be a very challenging task in biometrics. In this paper, we address the task of facial component-landmark detection. By “component” we refer to a rectangular subregion of the face, containing an anatomical component (e.g., “eye”). We present a fully- automated system for facial component-landmark detection based on multi-resolution isotropic analysis and adaptive bag- of-words descriptors incorporated into a cascade of boosted classiﬁers. Speciﬁcally, ﬁrst each component-landmark detector is applied independently and then the information obtained is used to make inferences for the localization of multiple compo- nents. The advantage of our approach is that it has robustness to pose as well as illumination. Our method has a failure rate lower than that of commercial software. Additionally, we demonstrate that using our method for the initialization of a point landmark detector results in performance comparable with that of state- of-the-art methods. All of our experiments are carried out using data from a publicly available database. Index Terms— Steerable ﬁlters, landmark detection, face detection, cascade of classiﬁers, Bag-of-Words. I. I NTRODUCTION Facial landmark detection has been an active area of research due to a multitude of potential applications includ- ing face recognition and facial expression analysis. Numer- ous methods have been proposed, most of which follow a learning-based approach using appearance or geometric con- straints [1]. Among the state-of-the-art methods, approaches using Active Shape Model (ASM) and Active Appearance Model (AAM) have shown promising results under precise initializations in single or multiscale settings [2]. Obtaining an estimate of facial landmarks is an ill-conditioned problem due to pose, illumination, and expression variations. These factors compromise the performance of most facial landmark detection methods, especially for non-frontal face images. The inherent difﬁculties in point landmark estimation and detection motivates us to develop a new two-stage approach to solve this problem. By “component” landmark we refer to a subregion of the face, typically a rectangular window containing an anatomical component and the point land- marks associated with it. For instance, an “eye” component- landmark is a rectangle containing one of the eyes and point landmarks such as the “corner of the eye”. The two-stage point landmark detection strategy we propose starts with the detection of component-landmarks, and then uses their All authors are with the Computational Biomedicine Lab, University of Houston, TX, USA. This research was funded in part by the Ofﬁce of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Laboratory (ARL) and by the University of Houston (UH) Eckhard Pfeiffer Endowment Fund. All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the ofﬁcial views or policies of IARPA, the ODNI, the U.S. Government, or UH. location to restrict or initialize the search for point land- marks. Component-landmarks can also be used for gesture recognition [3] or face detection under occlusions [4]. This paper is devoted to the study of automatic component-landmark detection. The key elements of our method are: (I) A new multiresolution analysis imple- mentable with fast wavelet algorithms which allows the design of isotropic ﬁlters aimed at detecting edges and identifying singularities at different scales and with varying degrees of smoothness, independent of spatial orientations. The localization of the isotropic ﬁlters we use reduces the inﬂuence of pose and illumination in the ﬁltered out- puts. (II) The exploitation of sparsity in the output of the aforementioned ﬁlters to construct a codebook based on an adaptive implementation of the Bag-of-Words (BoW) approach, which is subsequently used to generate features. (III) The use of cascaded classiﬁers for component-landmark identiﬁcation. Our contribution is the development of a facial component-landmark detection method that is robust to pose and illumination variation, as supported by our experiments. We evaluate the performance of our algorithms using a subset of the MultiPIE database [5], which includes a variety of test images with different poses and illuminations. The rest of the paper is organized as follows: Section II reviews related work. Section III describes the methods used by our approach. Section IV presents performance evaluation while conclusions are provided in Section V. II. PREVIOUS WORK Existing methods for the detection of facial landmarks can be classiﬁed into two categories: generative approaches and discriminative approaches. The generative approaches attempt to ﬁt a generative model of shape or texture to the input face image optimizing over the multidimensional space of the model’s parameters. The discriminative approaches search for candidates for each of the predeﬁned landmarks of the face using feature detection methods. They then combine results based on the topological conﬁguration of the landmarks. The most popular generative models are the AAM and the ASM proposed by Cootes et al. [2], [6]. The AAM employs a statistical model for shape and texture parameters, which allows generation of new instances of facial images. The algorithm uses the texture residual between the target and the estimated images to iteratively update the model’s parameters. The ASM employs only shape parameters and it is guided by a local search around each point of the shape. Even though a variety of approaches have been proposed, the majority of prior work in facial landmark detection is