Non-Lambertian Reflectance Modeling and Shape Recovery of Faces Using Tensor Splines Ritwik Kumar, Member, IEEE, Angelos Barmpoutis, Member, IEEE, Arunava Banerjee, Member, IEEE, and Baba C. Vemuri, Fellow, IEEE Abstract—Modeling illumination effects and pose variations of a face is of fundamental importance in the field of facial image analysis. Most of the conventional techniques that simultaneously address both of these problems work with the Lambertian assumption and thus fall short of accurately capturing the complex intensity variation that the facial images exhibit or recovering their 3D shape in the presence of specularities and cast shadows. In this paper, we present a novel Tensor-Spline-based framework for facial image analysis. We show that, using this framework, the facial apparent BRDF field can be accurately estimated while seamlessly accounting for cast shadows and specularities. Further, using local neighborhood information, the same framework can be exploited to recover the 3D shape of the face (to handle pose variation). We quantitatively validate the accuracy of the Tensor Spline model using a more general model based on the mixture of single-lobed spherical functions. We demonstrate the effectiveness of our technique by presenting extensive experimental results for face relighting, 3D shape recovery, and face recognition using the Extended Yale B and CMU PIE benchmark data sets. Index Terms—Tensor splines, non-Lambertian reflectance, face relighting, 3D shape recovery, facial image analysis. Ç 1 INTRODUCTION P RECISELY capturing appearance and shape of objects has engaged human imagination ever since the conception of drawing and sculpting. With the invention of computers, a part of this interest was translated into the search for automated ways of accurately modeling and realistically rendering of appearances and shapes. Among all of the objects explored via this medium, human faces have stood out for their obvious importance. In recent times, the immense interest in facial image analysis has been fueled by applications like face recognition (on account of recent world events), pose synthesis, and face relighting (driven in part by the entertainment industry), among others. This in turn has led to tomes of literature on this subject, encompassing various techniques for modeling and render- ing appearances and shapes of faces. Our understanding of the process of image formation and the interaction of light and the facial surface has come a long way since we started [31], with many impressive strides along the way (e.g., [12], [23], [13]), but we are still some distance from an ideal solution. In our view, an ideal solution to the problem of modeling and rendering appearances and shapes of human faces should be able to generate extremely photo-realistic renderings of a person’s face, given just one 2D image of the face, in any desired illumination condition and pose, at a click of a button (real time). Furthermore, such a system should not require any manual intervention and should not be fazed by the presence of common photo-effects, like shadows and specularities, in the input. Last, such an ideal system should not require expensive data collection tools and processes, e.g., 3D scanners, and should not assume availability of metainformation about the imaging environ- ment (e.g., lighting directions, lighting wavelength, etc.). These general requirements have been singled out because the state of the art is largely comprised of systems which relax one or more of these conditions while satisfying others. Common simplifying assumptions include applic- ability of the Lambertian reflectance model (e.g., [12]), availability of 3D face model (e.g., [9]), manual initialization (e.g., [7]), absence of cast shadows in input images (e.g., [10]), availability of large amounts of data obtained from custom-built rigs (e.g., [23]), etc. These assumptions are noted as “simplifying” because human faces are known to be neither exactly Lambertian nor convex (and thus can have cast shadows), fitting a 3D model requires time- consuming large-scale optimization with manual selection of features for initialization, specialized data acquisition can be costly, and in most realistic applications, only a few images of a face are available. The method we propose in this paper moves the state of the art closer to the ideal solution by satisfying more of the above-mentioned attributes simultaneously. Our technique can produce photo-realistic renderings of human faces across arbitrary illumination and pose using as few as nine images (fixed pose and known illumination direction) with a spatially varying non-Lambertian reflectance model. Unlike most techniques, our method does not require input images to be free of cast shadows or specularities and can reproduce these in the novel renderings. It does not require any manual initialization and is a purely image-based technique (no expensive 3D scans are needed). Further- more, it is capable of working with images obtained from IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 3, MARCH 2011 553 . The authors are with the Department of Computer and Information Science and Engineering, CSE Building, University of Florida, Gainesville, FL 32611. E-mail: {rkkumar, abarmpou, arunava, vemuri}@cise.ufl.edu. Manuscript received 30 Jan. 2009; revised 30 Apr. 2009; accepted 4 Dec. 2009; published online 2 Mar. 2010. Recommended for acceptance by S.B. Kang. For information on obtaining reprints of this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number TPAMI-2009-01-0069. Digital Object Identifier no. 10.1109/TPAMI.2010.67. 0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society