Digital Object Identifier (DOI) 10.1007/s00138-002-0103-0 Machine Vision and Applications (2003) 14: 94–102 Machine Vision and Applications © Springer-Verlag 2003 Omnidirectional image-based modeling: three approaches to approximated plenoptic representations Hiroshi Ishiguro 1 , Kim C. Ng 2 , Richard Capella 2 , Mohan M. Trivedi 2 1 Department of Adaptive Machine Systems, Osaka University, Japan 2 Department of Electrical and Computer Engineering, University of California, San Diego, USA Abstract. In this paper we present a set of novel methods for image-based modeling using omnidirectional vision sensors. The basic idea is to directly and efficiently acquire plenoptic representations by using omnidirectional vision sensors. The three methods, in order of increasing complexity, are direct memorization, discrete interpolation, and smooth interpola- tion. Results of these methods are compared visually with ground-truth images taken from a standard camera walking along the same path. The experimental results demonstrate that our methods are successful at generating high-quality vir- tual images. In particular, the smooth interpolation technique approximates the plenoptic function most closely. A compar- ative analysis of the computational costs associated with the three methods is also presented. Keywords: Video array – Real-time tracking – Intelligent room – Omnidirectional camera – Face detection 1 Introduction Visual modeling, as discussed in this paper, deals with the development of a computer-based representation of the 3D volumetric as well as illumination-cum-reflectance properties of an environment from any desired vantage point. Efficient means for deriving such visual models are necessary for a wide range of applications in virtual/augmented reality, telep- resence, or remote surveillance. Automatic means for deriving such models have great demand and potential. However, de- veloping such an efficient algorithm can be difficult, especially if the coverage scene is large and with dynamic objects pre- vailing (not to mention the additional difficulties introduced if concurrent, multiple novel views are allowed). In this paper, we discuss a set of methods to address such a need. The existing literature contains two basic types of vi- sual modeling methods that commonly utilize multiple cam- eras/images. One is an extension of the multiple-camera stereo developed in computer vision, and the other is to approximate plenoptic function [1] with densely sampled raw images in Correspondence to: H. Ishiguro (e-mail: ishiguro@sys.wakayama-u.ac.jp) computer graphics. The plenoptic function has been proposed as an ideal function representing complete visual information in a 3D space. The approaches developed in this paper rep- resent a hybrid of these two methods based on the unique properties of omnidirectional images (ODIs). Our methods do not extract three dimensions for the whole scene; in fact, we extract only a single 3D point lying along the center viewing direction of our desired virtual view plane. The virtual view plane at that 3D point is used to affine transform our selected raw image into the novel view. The selected raw image is cho- sen based on its correlation of distance and viewing direction to our desired virtual view. The basic ideas are summarized as: Omnidirectional vision sensors (ODVSs), developed by us, directly approximate the plenoptic representation of the en- vironment. 3D geometrical constraints 1 extracted by our modified multiple-camera stereo are used to interpolate novel views between omnidirectional images (where ODVSs do not ex- ist). In the remaining part of this section, we survey the related works and state our approach. The following sections explain our methods through three modeling steps using ODVSs. Generally, 3D reconstruction by stereo is not sufficiently stable for practical use. On the other hand, recent progress in vision devices and the related computer interfaces enable us to use many cameras simultaneously. Multiple-camera stereo using many cameras compensates for the problem of matching between images and provides robust and stable range infor- mation. Okutomi and Kanade [2] proposed multiple-baseline stereo. The stereo was applied to a virtual-reality system called virtual dome [3]. Fifty-six precisely calibrated cameras ob- serve targets inwards from the surroundings and reconstruct 3D models in a small constrained area. Boyd and colleagues [4] developed a 3D modeling sys- tem called multiple-perspective interactive video. Multiple- baseline stereo performs template matching on the image plane, while the method of Boyd et al. finds corresponding 1 This means range information. However, our methods do not directly refer to range information.