Digital Object Identifier (DOI) 10.1007/s00138-002-0103-0
Machine Vision and Applications (2003) 14: 94–102
Machine Vision and
Applications
© Springer-Verlag 2003
Omnidirectional image-based modeling: three approaches
to approximated plenoptic representations
Hiroshi Ishiguro
1
, Kim C. Ng
2
, Richard Capella
2
, Mohan M. Trivedi
2
1
Department of Adaptive Machine Systems, Osaka University, Japan
2
Department of Electrical and Computer Engineering, University of California, San Diego, USA
Abstract. In this paper we present a set of novel methods for
image-based modeling using omnidirectional vision sensors.
The basic idea is to directly and efficiently acquire plenoptic
representations by using omnidirectional vision sensors. The
three methods, in order of increasing complexity, are direct
memorization, discrete interpolation, and smooth interpola-
tion. Results of these methods are compared visually with
ground-truth images taken from a standard camera walking
along the same path. The experimental results demonstrate
that our methods are successful at generating high-quality vir-
tual images. In particular, the smooth interpolation technique
approximates the plenoptic function most closely. A compar-
ative analysis of the computational costs associated with the
three methods is also presented.
Keywords: Video array – Real-time tracking – Intelligent
room – Omnidirectional camera – Face detection
1 Introduction
Visual modeling, as discussed in this paper, deals with the
development of a computer-based representation of the 3D
volumetric as well as illumination-cum-reflectance properties
of an environment from any desired vantage point. Efficient
means for deriving such visual models are necessary for a
wide range of applications in virtual/augmented reality, telep-
resence, or remote surveillance. Automatic means for deriving
such models have great demand and potential. However, de-
veloping such an efficient algorithm can be difficult, especially
if the coverage scene is large and with dynamic objects pre-
vailing (not to mention the additional difficulties introduced if
concurrent, multiple novel views are allowed). In this paper,
we discuss a set of methods to address such a need.
The existing literature contains two basic types of vi-
sual modeling methods that commonly utilize multiple cam-
eras/images. One is an extension of the multiple-camera stereo
developed in computer vision, and the other is to approximate
plenoptic function [1] with densely sampled raw images in
Correspondence to: H. Ishiguro
(e-mail: ishiguro@sys.wakayama-u.ac.jp)
computer graphics. The plenoptic function has been proposed
as an ideal function representing complete visual information
in a 3D space. The approaches developed in this paper rep-
resent a hybrid of these two methods based on the unique
properties of omnidirectional images (ODIs). Our methods do
not extract three dimensions for the whole scene; in fact, we
extract only a single 3D point lying along the center viewing
direction of our desired virtual view plane. The virtual view
plane at that 3D point is used to affine transform our selected
raw image into the novel view. The selected raw image is cho-
sen based on its correlation of distance and viewing direction
to our desired virtual view. The basic ideas are summarized
as:
• Omnidirectional vision sensors (ODVSs), developed by us,
directly approximate the plenoptic representation of the en-
vironment.
• 3D geometrical constraints
1
extracted by our modified
multiple-camera stereo are used to interpolate novel views
between omnidirectional images (where ODVSs do not ex-
ist).
In the remaining part of this section, we survey the related
works and state our approach. The following sections explain
our methods through three modeling steps using ODVSs.
Generally, 3D reconstruction by stereo is not sufficiently
stable for practical use. On the other hand, recent progress in
vision devices and the related computer interfaces enable us
to use many cameras simultaneously. Multiple-camera stereo
using many cameras compensates for the problem of matching
between images and provides robust and stable range infor-
mation.
Okutomi and Kanade [2] proposed multiple-baseline
stereo. The stereo was applied to a virtual-reality system called
virtual dome [3]. Fifty-six precisely calibrated cameras ob-
serve targets inwards from the surroundings and reconstruct
3D models in a small constrained area.
Boyd and colleagues [4] developed a 3D modeling sys-
tem called multiple-perspective interactive video. Multiple-
baseline stereo performs template matching on the image
plane, while the method of Boyd et al. finds corresponding
1
This means range information. However, our methods do not
directly refer to range information.