U
sing panoramic environment maps and
3D object models with surface textures
derived from natural imagery greatly facilitates photo-
realistic rendering. Taking a single snapshot spanning
the whole scene of interest sounds
easy, but an off-the-shelf camera
might not have enough resolution to
adequately capture the whole scene
at once. Because of recent advances
in automated video and image pro-
cessing and the availability of pow-
erful commodity computational
platforms, even nonexpert users can
now create image mosaics and
panoramas by seamlessly stitching
multiple image and video frames
captured using handheld cameras.
This article presents a complete
approach for automatically con-
structing mosaics from images and
video, constituting a practical end-to-end system. Most
of the components we describe have been accessible to
the computer vision community and, to some extent, to
the graphics community in the recent past. However, the
current presentation attempts to make all the compo-
nents accessible to readers that range from general users
to the larger graphics, video, and vision communities.
Diverse applications like high-resolution stills,
1
trav-
el and real estate, virtual reality and telepresence,
2
motion-picture production, geographic information sys-
tems (GIS),
3
and surgery
4
could benefit from enhanc-
ing the quality and efficiency of image capture and
mosaicking. Researchers have discussed many of these
uses before,
5
so we will highlight the generality of our
approach, encompassing planar and spherical mosaics
(useful for consumer and professional/business appli-
cations) and geocoded terrestrial mosaics (useful for
GIS and related systems).
Overview of the problem
Warping each frame of the input video or image
sequence to a common surface and compositing the
overlapping imagery forms a mosaic. Barring artistic
intentions like David Hockney’s multiperspective photo
collages,
6
mosaics should be photometrically and spa-
tially continuous—that is, free of seams that betray the
underlying frames.
We can achieve spatial continuity by controlling the
warps so that the scene contents in overlapping areas
align. When the camera makes a single unidirectional
pass over the scene, we call this a 1D scan because we
can essentially index the motion path and correspond-
ing frame layout by one spatial parameter. Frames in a
1D scan are spatially contiguous if and only if they’re
temporally contiguous, so we could construct their
mosaic using a local alignment process that aligns only
adjacent frames. This simple 1D frame-to-frame method
is analogous to making a collage from printed photos by
lining up and instant-gluing successively numbered pic-
tures (see Figure 1a).
However, to capture a large field of view at high res-
olution with a modest resolution sensor, the camera may
need to traverse the scene with a 2D scan pattern, such
as a zigzag or spiral, whose motion path requires two
parameters to characterize well. In this case, temporal-
ly noncontiguous frames may be spatially contiguous.
Simply gluing successive frames could cause trouble.
Frames 1 and 6 in Figure 1b are likely to end up mis-
aligned due to an accumulation of past errors, which we
can’t correct later because the glue is permanent. In Fig-
ure 2a, for instance, we scanned the chapel’s facade in
four up-down swaths of frames. Blindly applying the 1D
frame-to-frame approach
5
to the chapel creates the dis-
jointed mosaic in Figure 2b.
For a 2D scan, achieving continuity requires a 2D
mosaicking method that aligns spatially overlapping
frames, including those not temporally adjacent. In
other words, we should apply slow-drying glue to all
overlapping pairs and move the frames around to align
everything reasonably well (see Figure 1c). Researchers
in photogrammetry and computer vision have long rec-
ognized that we can define the optimal mosaic by min-
imizing the total alignment error in overlap regions
simultaneously with respect to the placement parame-
0272-1716/02/$17.00 © 2002 IEEE
Image-Based Modeling, Rendering, and Lighting
44 March/April 2002
We present a complete
approach for automated
construction of mosaics
from images and video using
topology inference, local
and global alignment, and
compositing.
Steve Hsu, Harpreet S. Sawhney, and
Rakesh Kumar
Sarnoff Corporation
Automated
Mosaics via
Topology Inference