550 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 4, JUNE 2000 3-D Scene Reconstruction with Viewpoint Adaptation on Stereo Displays André Redert, Emile Hendriks, and Jan Biemond, Fellow, IEEE Abstract—In this paper, we propose a generic algorithm for the geometrically correct reconstruction of 3-D scenes on stereo displays with viewpoint adaptation. This forms the basis of multiviewpoint systems, which are currently the most promising candidates for real-time implementations of 3-D visual communi- cation systems. The reconstruction algorithm needs 3-D tracking of the viewers’ eyes with respect to the display. We analyze the effect of eye-tracking errors. A simple bound will be derived, below which reconstruction errors cannot be observed. We design a multiviewpoint system using a recently introduced image-based scene representation. The design formed the basis of the real-time multiviewpoint system that was recently built in the European PANORAMA project. Experiments with both natural and syn- thetic scenes show that the proposed reconstruction algorithm performs well. The experiments are performed by computer simulations and the real-time PANORAMA system. Index Terms—Motion parallax, multiviewpoint system, stereo displays, viewpoint adaptation, 3-D scene reconstruction. I. INTRODUCTION S TEREO and 3-D systems are emerging rapidly in the area of human visual communication. Applications can be found in medical areas (remote expert consultancy during operations), industrial areas (inspection in hazardous envi- ronments), and interpersonal communication in which the telepresence is enhanced. In “through-the-window” based systems, in which the scene is reconstructed by a 2-D display [28], ideal scene reconstruc- tion can be achieved in by holograms, in principle. They allow any number of viewers simultaneously, and provide for each of them the stereoscopic depth cue (a different image presented to each eye), the lens accommodation cue (focal length of the eye lens is related to the depth of the chosen object of interest), and the motion parallax cue (scene viewpoint changes when viewer moves). However, the current state of technology does not allow real-time holographic video acquisition systems. For implementations of real-time 3-D communication sys- tems, stereo and multiviewpoint systems are currently the most promising candidates [8]. These systems aim at providing as much of the aforementioned cues as possible while minimizing geometric and photometric reconstruction errors. Manuscript received March 15, 1999; revised September 30, 1999. This work was supported by the European PANORAMA project. This paper was recom- mended by Guest Editor M. G. Strintzis. A. Rert was with the Information and Communication Theory Group, Delft University of Technology, 2628 CD Delft, The Netherlands. He is now with Philips Research Laboratories, 5656 AA, Eindhoven, The Netherlands. E. Hendriks and J. Biemond are with the Information and Communication Theory Group, Delft University of Technology, 2628 CD Delft, The Nether- lands. Publisher Item Identifier S 1051-8215(00)04887-4. Fig. 1. Geometric distortion in a stereo system. In a stereo system, the scene is captured by two normal cam- eras, the images of which are transmitted and shown on a stereo CRT or LCD-like display [3]. By some means, the two images are projected separately on the left and right eye. Such a stereo system provides the stereoscopic depth cue, which gives some impression of depth in the scene. The accommodation cue does not work, since the lense of the human eye focuses on the dis- play, regardless of the depth of the object on which the viewer focuses his attention. This results in a conflict between the con- vergence of the two eyes and the accommodation, causing visual strain [9], [21]. The motion parallax cue also does not appear, since the shown images do not depend on the position of the viewer. At most, one specific stereo viewpoint provides a geometrically undistorted scene reconstruction [1], [12] (see Fig. 1). Any movement away from this position results in geometric distortion of the oberved scene. The distortion depends nonlinearly on both viewing po- sition and scene point positions. It yields distortion of position, angles and scale (e.g., the so-called “puppet theater effect” [12]). The minimization of visual strain due to such distortions is not easy. It requires a careful setup of cameras and display [1], [9], [21], guided by subjective tests that provide information about the resilience of human vision to the distortions. Multiviewpoint systems reduce distortions shown in Fig. 1 by the introduction of motion parallax. They provide images to the viewer that change with viewing position. Two different ap- proaches can be seen. First, displays exist that provide a limited number of viewpoints simultaneously [7], [28]. These displays may serve multiple viewers. However, the freedom of move- ment is restricted (mostly only horizontal) and the motion par- allax is not continuous but discrete. The second type of multiviewpoint system provides only a single stereo-image pair to a single viewer (see Fig. 2). In this system, first a model of the scene needs to be acquired, e.g., on the basis of stereo-image capturing and analysis. After transmis- sion of the model, a new stereo pair is synthesized at the recon- struction side. The motion parallax cue is then provided by con- tinuously adapting the displayed stereo images to the current eye 1051–8215/00$10.00 © 2000 IEEE