On the Use of Shadows in Stance Recovery Alfred M. Bruckstein, 1 Robert J. Holt, 1 Yves D. Jean, 2 Arun N. Netravali 1 1 Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974 2 Avaya Communication, Murray Hill, NJ 07974 “ . . . the highest sum would be too little to pay for such a priceless shadow.”—A. von Chamisso, Peter Schlemiel: the man who sold his shadow. ABSTRACT: The image of an object and of the shadow it casts on a planar surface provides important cues for three-dimensional (3D) stance recovery. We assume that the position of the plane on which the shadow lies with respect to a pinhole camera is known and that the position of the light source is unknown. If the light source is sufﬁciently far away that parallel projection may be assumed, then knowledge of two point correspondences between images of feature points and images of their shadows is enough to determine the position of the object and the direction of the light source. If the light source is close enough that the shadow points are obtained via perspective projection, then there is a one-parameter inﬁnite family of solutions for the position of the object and the light source. Deter- mining the stance of an object is highly sensitive to noise, so we provide algorithms for stance recovery that take into account known information about the object. In our experiments, the errors for the location of the 3D feature points obtained by these algorithms are generally less than 0.2% times the error in pixels in the image points and the errors for the 3D directions of the links is roughly 0.04° times the error in pixels, normalized by the distance to the object from the camera and the length of the link. © 2001 John Wiley & Sons, Inc. Int J Imaging Syst Technol, 11, 315–330, 2000 I. INTRODUCTION Suppose an articulated object, like the human body, is seen in an image along with the shadow it casts on a planar surface, e.g., the ground or a wall. It is clear that the additional information provided by the shadow should make it easier for the viewer to assess the object’s three-dimensional (3D) location and shape. In this paper, we investigate the topic of shape and pose recovery from images of objects casting shadows from a strong single source of light such as the sun or a nearby omnidirectional light source. There are few papers in the computer vision literature that report work on using shadows to recover scene geometry. The earliest work is perhaps due to Shafer (1985) and Shafer and Kanade (1983). They analyzed the role that shadows and silhouettes play in the automatic interpretation of images of solid objects under various viewpoints and illuminations. Shadows were also used to delineate and locate objects in images (Thompson et al., 1987) and to analyze scenes with electronic components (Tsuji et al., 1984). Researchers used shadows to analyze aerial imagery, mostly of urban or indus- trial areas (Nevatia, 1998; Shufelt, 1996, 1999). A later development is the work of Kender and Smith (1987) and Yang and Kender (1996). They proposed an active vision method of illuminating a scene with a moving light source and learning about the scene geometry from the shadows that vary in time. Paralleling the work in computer vision and following the footsteps of artists like Leonardo da Vinci who realized the importance of shadows in rendering realistic scenes by providing a qualitative sense of depth, vision researchers assessed the importance of shadows for human image interpretation (for works on depth and motion perception, see Kersten et al., 1996, 1997; Yonas et al., 1978). Knill et al. (1997) summarized the geometric issues involved in shadow formation and analyzed in depth the ways shadows provide perception cues for scenes of objects with smooth boundaries. The use of shadows in photography was also investigated by Bouguet and Perona (1998, 1999). As far as we know, the problem of viewing articulated thin objects in strong single source illumination, with a ground plane that is accurately located with respect to the camera, has not yet been discussed. Although these might be considered rather restrictive assumptions, they are realistic in a variety of practical applications, such as interpreting scenes at various sports events (e.g., tennis, soccer) where people tracking and stance recovery are needed for automated understanding and virtual replay. This paper is organized as follows: the fundamentals of pose from shadows are discussed in Section II, least squares solutions for pose estimation are provided in Section III, and practical consider- ation together with experiments on synthetic and real images are presented in Sections IV and V. II. FUNDAMENTALS OF POSE FROM SHADOWS We will assume that an articulated 3D object is viewed under strong illumination by a perspective projection camera. The object casts shadows on a (background) plane and its shadows, by assumption, are also at least partially visible in the image. The illumination that Correspondence to: Robert J. Holt. E-mail: rjh@research.bell-labs.com Alfred M Bruckstein’s permanent address: Department of Computer Science, Technion—IIT 32000, Haifa, Israel. © 2001 John Wiley & Sons, Inc.