Human Face Modeling and Recognition Through Multi-View High Resolution
Stereopsis
Xin Chen, Timothy Faltemier, Patrick Flynn, Kevin Bowyer
University of Notre Dame
Notre Dame, IN 46556 USA
{xchen2, tfaltemi, flynn, kwb}@cse.nd.edu
Abstract
This paper presents a novel approach to face recogni-
tion that relies on 2D images to successfully reconstruct 3D
shape of the human face. This approach ultimately outper-
forms 3D shape obtained from a commercial scanner. Ad-
ditionally, our approach improves 2D recognition perfor-
mance of 93.29% to 97.32%. Specifically, we employ mul-
tiple 2D views of a subject’s face to reconstruct several 3D
models through binocular stereopsis. We use the ICP (It-
erative Closest Point) algorithm to match the 3D probe to
the 3D gallery for each view, thereby forming a voting com-
mittee of multiple members to determine the final match-
ing score. We achieve an 85.23% rank-one recognition rate
on our data set consisting of 149 distinct subjects, superior
to the performance of a commercial 3D scanner. This is
noteworthy given that our approach does not require strict
calibration as in the case of the commercial 3D scanner.
Also significant is the demonstrated flexibility of this sys-
tem to successfully perform 3D recognition on a database
acquired originally for 2D face recognition.
1. Introduction
Identifying an individual from his or her face is one of
the most non-intrusive modalities in biometrics. Three-
dimensional imagery is an intriguing sensory modality for
face recognition systems. It may offer superior performance
because it is robust in spite of environmental variations,
and it is arguably less vulnerable to deliberate attempts to
obscure identity. However, current commercial 3D scan-
ners cannot operate with the same flexibility as 2D cam-
eras when used under varied lighting, depth of field, and
timing conditions [4]. Consequently, 3D face imaging re-
quires greater cooperation on the part of the subject. Also,
some 3D sensor hardware, such as the Minolta Vivid 910,
is “active” in the sense that it projects light of some type
onto the subject. The cost-effectiveness of 2D cameras is
another significant advantage, because state-of-the-art 3D
sensors would be cost-prohibitive for some consumers and
researchers.
Horace et al. [9] present a scheme for constructing a 3D
head model from two orthogonal views. They instantiate
a generic 3D head model based on a set of facial features.
Next, they generate a distortion vector field that deforms
the generic model. The combined input of the two facial
images is blended and texture-mapped onto the 3D head
model. The contribution of their research is limited by their
assumption that the camera’s projections are orthographic.
Chen’s et al. [5] reconstruction relies on a fundamen-
tal matrix estimate to build a 3D human frontal face model
from two photographs. Their approach first estimates the
fundamental matrix [13], next rectifies the image pair and
matches the images to generate the disparity map, and fi-
nally, infers the 3D shape. Although these researchers cre-
ated aesthetic face models by interactively adjusting the fo-
cal length, this is likely a prohibitively labor intensive and
subjective approach for facial recognition.
Medioni et al. [10] have designed a system to perform
stereo matching on two images taken with an angular base-
line of a few degrees for face authentication. The cameras
are calibrated, both internally and externally. They maintain
that the face can move up to about 30cm from its optimal
distance to the cameras, without noticeable change of qual-
ity. They validate their 3D recognition engine on all possi-
ble pairs from a database of 100 subjects, each acquired in 7
different poses within
+
-
20 degrees of a frontal view. They
yielded an equal error rate better than 2%.
The 3Q scanner [1] uses stereo that is assisted by pro-
jecting a speckle pattern onto the scene. This style of active
stereo tends to be more resilient to variances in lighting con-
ditions and enables the use of a wider range of camera sen-
sors because the controlled random texture is momentarily
projected onto the surface of the subject.
The performance of PCA-based 2D intensity face recog-
nition will generally improve when the training set is ex-
panded. However, a large training set is not always pos-
Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06)
0-7695-2646-2/06 $20.00 © 2006 IEEE