The Generic Viewpoint Assumption and Planar Bias A.L. Yuille, Member, IEEE, James M. Coughlan, and S. Konishi Abstract—We show that generic viewpoint and lighting assumptions resolve standard visual ambiguities by biasing toward planar surfaces. Our model uses orthographic projection with a two-dimensional affine warp and Lambertian reflectance functions, including cast and attached shadows. We use uniform priors on nuisance variables such as viewpoint direction and the light source. Limitations of using uniform priors on nuisance variables are discussed. Index Terms—Generic viewpoint, Bayesian inference, visual ambiguities. æ 1 INTRODUCTION THE generic viewpoint assumption has been suggested as a way to resolve visual ambiguities [4], [11], [7] and has been used to explain perceptual phenomena, e.g., [1] and references cited. The idea is that some interpretations of an image correspond to accidental views and are unstable, in the sense that small changes in the viewing position would induce large changes in the image. The generic viewpoint assumption favors those image interpretations which are stable with respect to small changes of viewpoint. This generic assumption can be extended to apply to other variables such as lighting. A precise mathematical formulation of the generic viewpoint assumption was proposed by Freeman [7]. He formulated the interpretation problem as probabilistic inference, where the view- point direction is treated as a nuisance parameter to be integrated out [12]. More recently, Weinshall and Werman [13] analyzed the generic viewpoint assumption using a different formulation than Freeman’s. They represented objects as point features and showed that the assumption causes a bias toward planar objects. They hypothesized that this planar bias would also hold for Freeman’s formulation. In this paper, we examine the effect of the generic viewpoint assumption for resolving visual ambiguities using Freeman’s formulation. In particular, we examine those shape and shading ambiguities which have been found by analyzing the geometry of point set features [10], [6], [9] and the photometric properties of objects with Lambertian reflectance functions [2], [17]. We use uniform priors on nuisance variables, such as viewpoint and lighting, and discuss later the limitation of these priors. Our results show that there is a bias toward planar surfaces when the generic assumption is used for viewpoint, lighting, or a combination of both. This proves Weinshall and Werman’s hypothesis and goes further by including shading and shadowing effects. Like Weinshall and Werman [13], we use orthographic projection and allow for two-dimensional affine warps on the image plane. We treat the affine warps either as nuisance parameters to be integrated out, or as quantities to be estimated (both approaches yield a planar bias if the warps have uniform priors). These two treatments correspond to alternative ways to think of the warps. The warps could, for example, correspond to the parameters of an affine camera [9], which motivates integrating them out. Alternatively, estimating the warps leads to an affine invariant measure of similarity between images as advocated by Werman and Weinshall [14]. In either case, the affine warp can be justified by: 1) assuming that the camera parameters are only approximately known and/or 2) modifying the orthographic projection equations to allow for perspective effects [9]. We first give the probabilistic formulation of the generic viewpoint assumption in Section 2, define the ambiguities we will be dealing with in Section 3, prove our results in Section 4, and then close with a discussion in Section 5. 2 VISUAL AMBIGUITIES AND THE GENERIC VIEWPOINT ASSUMPTION This section describes the mathematical framework for visual ambiguities and the generic viewpoint assumption. The framework is general and applies to any probabilistic estimation problem [12]. We assume that the image formation process is specified by a likelihood function P ðI jO; hÞ, where I is the observed image, O is the object being viewed, and h is a nuisance variable (e.g., viewpoint or lighting). Visual ambiguities arise when there are many different ways of generating the same image. For example, if P ðI jO; hÞ = P ðI j ^ O; ^ hÞ, then it seems difficult to distinguish between O; h and ^ O; ^ h. A large class of visual ambiguities (see Section 3) correspond to a group of transformations that can be made on an object and the nuisance variable. For example, suppose we have P ðI jO; hÞ¼ P ðI jf t ðOÞ;f t ðhÞÞ, where f t ð:Þ is an element of a group of transformations on the object O and nuisance parameter h, and t indexes the group element. Then, we state that the likelihood function is invariant to the group of transformations ff t ð:Þg. Much of the work on visual ambiguities, e.g., [10], [6], [2], [9], [17], assumes that the image formation model is purely deterministic. This special case can be obtained from our formulation by setting the likelihood function to be a delta function (e.g., P ðI jO; hÞ = ðI F ðO; hÞÞ for some function F ð:; :Þ, where ð:Þ is the Dirac delta function). But, these ambiguities will also remain even if we allow for noise in the imaging model, see Section 4. The Generic Viewpoint Assumption (GVA) [4], [11], [7] is a method for resolving these ambiguities. First, the problem is expressed as Bayesian estimation by placing prior distributions P ðOÞ;P ðhÞ on O and h. The task of estimating O and h from I can be formulated as Bayesian inference using the posterior distribution: P ðO; hjI Þ¼ P ðI jO; hÞP ðOÞP ðhÞ R d ^ hd ^ OP ðI j ^ O; ^ hÞP ð ^ OÞP ð ^ hÞ : ð1Þ Freeman’s proposal is to estimate O alone after integrating out the nuisance parameter h. This corresponds to the standard procedure for dealing with nuisance variables in statistics [12]. It reduces to estimating O from: P ðOjI Þ¼ Z dh P ðO; hjI Þ: ð2Þ Freeman’s insight [7] is that the integration over h is often sufficient to resolve many visual ambiguities even if the prior distributions on O and h are uniform. (For these priors, the posterior distribution P ðO; hjI Þ is ambiguous if the likelihood is, i.e., P ðO; hjI Þ¼ P ðf t ðOÞ;f t ðhÞjI Þ provided P ðI jO; hÞ = P ðI jf t ðOÞ;f t ðhÞÞÞ. If uniform priors are assumed, then the generic viewpoint assumption reduces to simply integrating the likelihood function to obtain P ðOjI Þ¼ð1=ZÞ R dh P ðI jO; hÞ, where Z is a constant and solving for: O ¼ arg max O Z dhP ðI jO; hÞ: ð3Þ To understand how the GVA works using uniform priors, suppose we have an ambiguity so that P ðI jO; hÞ = P ðI jf t ðOÞ;f t ðhÞÞ. We calculate: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 6, JUNE 2003 775 . The authors are with the Smith-Kettlewell Eye Research Institute, 2318 Fillmore Street, San Francisco, CA 94115. E-mail: {yuille, coughlan, konishi}@ski.org. Manuscript received 29 Apr. 2002; revised 10 Oct. 2002; accepted 13 Oct. 2002. Recommended for acceptance by W. Freeman. For information on obtaining reprints of this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number 116428. 0162-8828/03/$17.00 ß 2003 IEEE Published by the IEEE Computer Society