Biologically Inspired Lighting Invariant Facial Identity Recognition Bharath Ramesh, Liushu Huang, Chao Tian, Cheng Xiang, and Tong Heng Lee Department of Electrical and Computer Engineering National University of Singapore, Singapore 117576 Email: elexc@nus.edu.sg (Cheng Xiang) Abstract—Over the past decade, a considerable amount of literature has been published on face recognition. Since recogni- tion of frontal face images under controlled settings has become easy to achieve, a number of recent studies have emphasized the importance of robustness to variations in pose and illumination. So in this paper, we undertake the task of recognizing face images taken under drastic lighting variations using a hierarchical facial identity recognition framework inspired by the human vision system. The proposed system employs a novel log polar-encoding of Gabor filtered outputs, in order to extract scale and rotation invariant edge information from face images. Besides the novel encoding strategy, an explicit pre-processing step is proposed to deal with drastic lighting changes. We tested the facial identity recognition framework on two popular face databases, Yale and ORL face database, and obtained recognition rates on par with recent works. Besides these standard databases, we tested on the Yale B database, which was specifically designed to test illumination invariance. In summary, we have outperformed state- of-the-art methods on the challenging Yale B face database using the proposed facial identity recognition framework. I. I NTRODUCTION In general, face recognition systems can be categorized into two: holistic matching and local matching. Holistic or global matching treats the whole face region as a single entity for classification (e.g. [1], [2]). On the other hand, local or component matching divides the face region into sub-regions and combines the local statistics for classifying the face image [3]. Out of the two methods, local matching is less sensitive to pose changes & occlusion [4], and therefore it is adopted in this work. In the local matching strategy, there are various encoding strategies for the local patches. For instance, gray- level alone is used in some works, or filtered responses [5], or encoding of filtered responses [6]. The encoding strategy serves two purposes: (1) enforce invariant properties, and (2) downsample the filtered patches to decrease computational time. In this paper, we address both these goals using a log- polar raster sampling on Gabor filtered outputs, followed by spectral analysis using Fourier transform. The log-polar transform (LPT) [7] simulates the area of foveal vision in the human vision system, and thereby achieves a conformal mapping equivalent to a single instance of a visual fixation. In the computer vision literature, the most successful application of log-polar mapping is in the highly regarded shape descriptor, the shape context [8]. However, it is to be noted that shape context creates log-polar histograms instead of using the original LPT, which is exponentially sampling the image at the intersection of rings and wedges of the transform. Due to this exponential polar sampling, scale and rotation changes in the Cartesian image correspond to horizontal and vertical shifts in the log-polar domain, respectively [7]. Subse- quently, the Fourier transform modulus can be used to enforce translation invariance, and two log-polar images of similar facial parts will have “similar” Fourier transform magnitude. Nevertheless, a similar representation does not eliminate noise or guarantee discriminative properties. Feature extraction is meant to improve the performance of the classifier, by discarding irrelevant information such as noise and redundancy from the set of input features [9]. While noise can be readily regarded as a hindrance to optimal classification, it has also been observed that if the number of training samples is far less than the feature dimension (small sample size), then the ‘curse of dimensionality’ degrades the performance of the classifier [10]. From the earliest methods like principal component analysis (PCA) [11], Fisher’s linear discriminant (FLD) [12] to recent variants of recursive Fisher’s linear discriminant (RFLD) [13], the goal of feature extraction is to project the input features to a discriminant low-dimensional subspace. In this paper, we propose the combined usage of PCA, FLD and RFLD to improve classifier accuracy by reduc- ing the LPT feature dimension and eliminating noisy features. Thus, the feature extraction module is crucial for obtaining a discriminative representation of similar face images. The other important component in the framework is the pre-processing needed for dealing with drastic lighting variations. To tackle lighting problems in face recognition, three general approaches have been attempted. They are illumination normalization, illumination invariant feature extraction and face modeling [14]. The first approach aims to mitigate lighting variations by applying global image processing techniques like histogram equalization, logarithmic transform [15], etc. The second approach targets at extracting facial features that are invariant to lighting conditions. Edge maps and quotient images [5] are examples of features from this category. While these extracted features aim to be lighting invariant, they are unlikely to be so when large lighting variations are present. The third approach uses images under different lighting conditions to construct a generative 3-D face illumination cone [16], which is then estimated using low-dimensional linear subspace. A limitation of this approach is the need for training images under varying lighting conditions. However, as the images are likely to be taken at one single setting, the lighting variations are unlikely to be low. Therefore, face modeling is not very appropriate for practical scenarios. In order to have a robust pre-processing method, we adopt illumination normalization to reduce lighting variations across images. In 978-1-4799-7862-5/15/$31.00 © 2015 IEEE