IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 3, JUNE 2004 423 Wide Baseline Image Registration With Application to 3-D Face Modeling Amit K. Roy-Chowdhury, Rama Chellappa, Fellow, IEEE, and Trish Keaton Abstract—Establishing correspondence between features in two images of the same scene taken from different viewing angles is a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, three-dimensional (3-D) model align- ment, creation of panoramic views, etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching two-dimensional (2-D) shapes of the different features of the face (e.g., eyes, nose etc.). A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellation of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3-D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications. Index Terms— Biometrics, face modeling, feature correspon- dence, image registration. I. INTRODUCTION E STABLISHING correspondence between features in two images of the same scene taken from different viewing angles is a challenging problem in image processing and com- puter vision. The difficulty of the problem is compounded by the fact that the images may be obtained under different con- ditions of lighting and camera settings. However, its solution is an important step in many applications like wide baseline stereo, three-dimensional (3-D) model alignment, creation of panoramic views, etc. Numerous methods have been tried to solve this problem, ranging from techniques which take advan- tage of the knowledge of the geometry of the scene to ones which use different information theoretic measures to compute similarity. Manuscript received April 15, 2002; revised September 21, 2002. This work was supported in part by the National Science Foundation under Grant 0086075. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Chalapathy Neti. A. K. Roy-Chowdhury was with the Center for Automation Research Univer- sity of Maryland, College Park, MD 20742 USA. He is now with the Department of Electrical Engineering, University of California, Riverside, CA 92521 USA (e-mail: amitrc@ee.ucr.edu). R. Chellappa is with the Department of Electrical and Computer Engineering and the Center for Automation Research, University of Maryland, College Park, MD 20742 USA (e-mail: rama@cfar.umd.edu). T. Keaton is with the Department of Signal and Image Processing HRL Lab- oratories LLC, Malibu, CA 90265 USA (e-mail: pakeaton@hrl.com). Digital Object Identifier 10.1109/TMM.2004.827511 A. Literature Review One of the well-known methods for registration is the iter- ative closest point (ICP) algorithm [1] of Besl and McKay. It uses a mean-square distance metric which converges monoton- ically to the nearest local minimum. It was used for registering 3-D shapes by considering the full six degrees of freedom in the motion parameters. It has been extended to include the Leven- berg–Marquardt nonlinear optimization and robust estimation techniques to minimize the registration error [2]. Another well- known method for registering 3-D shapes is the work of Vemuri and Aggarwal where they used range and intensity data for re- constructing complete 3-D models from partial ones [3]. Reg- istering range data for the purpose of building surface models of 3-D objects was also the focus of the work in [4]. Matching image tokens across triplets, rather than pairs, of images has also been considered. In [5], the authors developed a robust es- timator for the trifocal tensor based upon corresponding tokens across an image triplet. This was then used to recover 3-D struc- ture. Reconstructing 3-D structure was also considered in [6] using stereo image pairs from an uncalibrated video sequence. However, most of these algorithms work given good initial con- ditions, e.g., for 3-D model alignment, the partial models have to be brought into approximate positions. The problem of auto- matic “crude” registration (in order to obtain good initial con- ditions) was addressed in [7], where the authors used bitangent curve pairs which could be found and matched efficiently. In the above methods, geometric properties are used to align 3-D shapes. Another important area of interest for registration schemes is two-dimensional (2-D) image matching, which can be used for applications like image mosaicing, retrieval from a database, medical imaging etc. Two-dimensional matching methods rely on extracting features or interest points. In [8], the authors show that interest points are stable under different geometric transformations and define their quality based on repeatability rate and information content. One of the most widely used schemes for tracking feature points is the KLT tracker [9], which combines feature selection and tracking across a sequence of images by minimizing the sum of squared intensity differences over windows in two frames. A probabilistic technique for feature matching in a multireso- lution Bayesian framework was developed in [10] and used in uncalibrated image mosaicing. In [11], the authors introduced the use of Zernike orthogonal polynomials to compute the relative rigid transformations between images. It allows the recovery of rotational and scaling parameters without the need for extensive correlation and search algorithms. Precise 1520-9210/04$20.00 © 2004 IEEE