Human Face Modeling and Recognition Through Multi-View High Resolution Stereopsis Xin Chen, Timothy Faltemier, Patrick Flynn, Kevin Bowyer University of Notre Dame Notre Dame, IN 46556 USA {xchen2, tfaltemi, flynn, kwb}@cse.nd.edu Abstract This paper presents a novel approach to face recogni- tion that relies on 2D images to successfully reconstruct 3D shape of the human face. This approach ultimately outper- forms 3D shape obtained from a commercial scanner. Ad- ditionally, our approach improves 2D recognition perfor- mance of 93.29% to 97.32%. Speciﬁcally, we employ mul- tiple 2D views of a subject’s face to reconstruct several 3D models through binocular stereopsis. We use the ICP (It- erative Closest Point) algorithm to match the 3D probe to the 3D gallery for each view, thereby forming a voting com- mittee of multiple members to determine the ﬁnal match- ing score. We achieve an 85.23% rank-one recognition rate on our data set consisting of 149 distinct subjects, superior to the performance of a commercial 3D scanner. This is noteworthy given that our approach does not require strict calibration as in the case of the commercial 3D scanner. Also signiﬁcant is the demonstrated ﬂexibility of this sys- tem to successfully perform 3D recognition on a database acquired originally for 2D face recognition. 1. Introduction Identifying an individual from his or her face is one of the most non-intrusive modalities in biometrics. Three- dimensional imagery is an intriguing sensory modality for face recognition systems. It may offer superior performance because it is robust in spite of environmental variations, and it is arguably less vulnerable to deliberate attempts to obscure identity. However, current commercial 3D scan- ners cannot operate with the same ﬂexibility as 2D cam- eras when used under varied lighting, depth of ﬁeld, and timing conditions [4]. Consequently, 3D face imaging re- quires greater cooperation on the part of the subject. Also, some 3D sensor hardware, such as the Minolta Vivid 910, is “active” in the sense that it projects light of some type onto the subject. The cost-effectiveness of 2D cameras is another signiﬁcant advantage, because state-of-the-art 3D sensors would be cost-prohibitive for some consumers and researchers. Horace et al. [9] present a scheme for constructing a 3D head model from two orthogonal views. They instantiate a generic 3D head model based on a set of facial features. Next, they generate a distortion vector ﬁeld that deforms the generic model. The combined input of the two facial images is blended and texture-mapped onto the 3D head model. The contribution of their research is limited by their assumption that the camera’s projections are orthographic. Chen’s et al. [5] reconstruction relies on a fundamen- tal matrix estimate to build a 3D human frontal face model from two photographs. Their approach ﬁrst estimates the fundamental matrix [13], next rectiﬁes the image pair and matches the images to generate the disparity map, and ﬁ- nally, infers the 3D shape. Although these researchers cre- ated aesthetic face models by interactively adjusting the fo- cal length, this is likely a prohibitively labor intensive and subjective approach for facial recognition. Medioni et al. [10] have designed a system to perform stereo matching on two images taken with an angular base- line of a few degrees for face authentication. The cameras are calibrated, both internally and externally. They maintain that the face can move up to about 30cm from its optimal distance to the cameras, without noticeable change of qual- ity. They validate their 3D recognition engine on all possi- ble pairs from a database of 100 subjects, each acquired in 7 different poses within + - 20 degrees of a frontal view. They yielded an equal error rate better than 2%. The 3Q scanner [1] uses stereo that is assisted by pro- jecting a speckle pattern onto the scene. This style of active stereo tends to be more resilient to variances in lighting con- ditions and enables the use of a wider range of camera sen- sors because the controlled random texture is momentarily projected onto the surface of the subject. The performance of PCA-based 2D intensity face recog- nition will generally improve when the training set is ex- panded. However, a large training set is not always pos- Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06) 0-7695-2646-2/06 $20.00 © 2006 IEEE