IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 3, JUNE 2004 423
Wide Baseline Image Registration With
Application to 3-D Face Modeling
Amit K. Roy-Chowdhury, Rama Chellappa, Fellow, IEEE, and Trish Keaton
Abstract—Establishing correspondence between features in
two images of the same scene taken from different viewing angles
is a challenging problem in image processing and computer vision.
However, its solution is an important step in many applications
like wide baseline stereo, three-dimensional (3-D) model align-
ment, creation of panoramic views, etc. In this paper, we propose
a technique for registration of two images of a face obtained from
different viewing angles. We show that prior information about
the general characteristics of a face obtained from video sequences
of different faces can be used to design a robust correspondence
algorithm. The method works by matching two-dimensional (2-D)
shapes of the different features of the face (e.g., eyes, nose etc.). A
doubly stochastic matrix, representing the probability of match
between the features, is derived using the Sinkhorn normalization
procedure. The final correspondence is obtained by minimizing
the probability of error of a match between the entire constellation
of features in the two sets, thus taking into account the global
spatial configuration of the features. The method is applied for
creating holistic 3-D models of a face from partial representations.
Although this paper focuses primarily on faces, the algorithm can
also be used for other objects with small modifications.
Index Terms— Biometrics, face modeling, feature correspon-
dence, image registration.
I. INTRODUCTION
E
STABLISHING correspondence between features in two
images of the same scene taken from different viewing
angles is a challenging problem in image processing and com-
puter vision. The difficulty of the problem is compounded by
the fact that the images may be obtained under different con-
ditions of lighting and camera settings. However, its solution
is an important step in many applications like wide baseline
stereo, three-dimensional (3-D) model alignment, creation of
panoramic views, etc. Numerous methods have been tried to
solve this problem, ranging from techniques which take advan-
tage of the knowledge of the geometry of the scene to ones
which use different information theoretic measures to compute
similarity.
Manuscript received April 15, 2002; revised September 21, 2002. This work
was supported in part by the National Science Foundation under Grant 0086075.
The associate editor coordinating the review of this manuscript and approving
it for publication was Dr. Chalapathy Neti.
A. K. Roy-Chowdhury was with the Center for Automation Research Univer-
sity of Maryland, College Park, MD 20742 USA. He is now with the Department
of Electrical Engineering, University of California, Riverside, CA 92521 USA
(e-mail: amitrc@ee.ucr.edu).
R. Chellappa is with the Department of Electrical and Computer Engineering
and the Center for Automation Research, University of Maryland, College Park,
MD 20742 USA (e-mail: rama@cfar.umd.edu).
T. Keaton is with the Department of Signal and Image Processing HRL Lab-
oratories LLC, Malibu, CA 90265 USA (e-mail: pakeaton@hrl.com).
Digital Object Identifier 10.1109/TMM.2004.827511
A. Literature Review
One of the well-known methods for registration is the iter-
ative closest point (ICP) algorithm [1] of Besl and McKay. It
uses a mean-square distance metric which converges monoton-
ically to the nearest local minimum. It was used for registering
3-D shapes by considering the full six degrees of freedom in the
motion parameters. It has been extended to include the Leven-
berg–Marquardt nonlinear optimization and robust estimation
techniques to minimize the registration error [2]. Another well-
known method for registering 3-D shapes is the work of Vemuri
and Aggarwal where they used range and intensity data for re-
constructing complete 3-D models from partial ones [3]. Reg-
istering range data for the purpose of building surface models
of 3-D objects was also the focus of the work in [4]. Matching
image tokens across triplets, rather than pairs, of images has
also been considered. In [5], the authors developed a robust es-
timator for the trifocal tensor based upon corresponding tokens
across an image triplet. This was then used to recover 3-D struc-
ture. Reconstructing 3-D structure was also considered in [6]
using stereo image pairs from an uncalibrated video sequence.
However, most of these algorithms work given good initial con-
ditions, e.g., for 3-D model alignment, the partial models have
to be brought into approximate positions. The problem of auto-
matic “crude” registration (in order to obtain good initial con-
ditions) was addressed in [7], where the authors used bitangent
curve pairs which could be found and matched efficiently.
In the above methods, geometric properties are used to align
3-D shapes. Another important area of interest for registration
schemes is two-dimensional (2-D) image matching, which can
be used for applications like image mosaicing, retrieval from
a database, medical imaging etc. Two-dimensional matching
methods rely on extracting features or interest points. In
[8], the authors show that interest points are stable under
different geometric transformations and define their quality
based on repeatability rate and information content. One of
the most widely used schemes for tracking feature points is
the KLT tracker [9], which combines feature selection and
tracking across a sequence of images by minimizing the sum
of squared intensity differences over windows in two frames.
A probabilistic technique for feature matching in a multireso-
lution Bayesian framework was developed in [10] and used in
uncalibrated image mosaicing. In [11], the authors introduced
the use of Zernike orthogonal polynomials to compute the
relative rigid transformations between images. It allows the
recovery of rotational and scaling parameters without the
need for extensive correlation and search algorithms. Precise
1520-9210/04$20.00 © 2004 IEEE