U sing panoramic environment maps and 3D object models with surface textures derived from natural imagery greatly facilitates photo- realistic rendering. Taking a single snapshot spanning the whole scene of interest sounds easy, but an off-the-shelf camera might not have enough resolution to adequately capture the whole scene at once. Because of recent advances in automated video and image pro- cessing and the availability of pow- erful commodity computational platforms, even nonexpert users can now create image mosaics and panoramas by seamlessly stitching multiple image and video frames captured using handheld cameras. This article presents a complete approach for automatically con- structing mosaics from images and video, constituting a practical end-to-end system. Most of the components we describe have been accessible to the computer vision community and, to some extent, to the graphics community in the recent past. However, the current presentation attempts to make all the compo- nents accessible to readers that range from general users to the larger graphics, video, and vision communities. Diverse applications like high-resolution stills, 1 trav- el and real estate, virtual reality and telepresence, 2 motion-picture production, geographic information sys- tems (GIS), 3 and surgery 4 could beneﬁt from enhanc- ing the quality and efficiency of image capture and mosaicking. Researchers have discussed many of these uses before, 5 so we will highlight the generality of our approach, encompassing planar and spherical mosaics (useful for consumer and professional/business appli- cations) and geocoded terrestrial mosaics (useful for GIS and related systems). Overview of the problem Warping each frame of the input video or image sequence to a common surface and compositing the overlapping imagery forms a mosaic. Barring artistic intentions like David Hockney’s multiperspective photo collages, 6 mosaics should be photometrically and spa- tially continuous—that is, free of seams that betray the underlying frames. We can achieve spatial continuity by controlling the warps so that the scene contents in overlapping areas align. When the camera makes a single unidirectional pass over the scene, we call this a 1D scan because we can essentially index the motion path and correspond- ing frame layout by one spatial parameter. Frames in a 1D scan are spatially contiguous if and only if they’re temporally contiguous, so we could construct their mosaic using a local alignment process that aligns only adjacent frames. This simple 1D frame-to-frame method is analogous to making a collage from printed photos by lining up and instant-gluing successively numbered pic- tures (see Figure 1a). However, to capture a large ﬁeld of view at high res- olution with a modest resolution sensor, the camera may need to traverse the scene with a 2D scan pattern, such as a zigzag or spiral, whose motion path requires two parameters to characterize well. In this case, temporal- ly noncontiguous frames may be spatially contiguous. Simply gluing successive frames could cause trouble. Frames 1 and 6 in Figure 1b are likely to end up mis- aligned due to an accumulation of past errors, which we can’t correct later because the glue is permanent. In Fig- ure 2a, for instance, we scanned the chapel’s facade in four up-down swaths of frames. Blindly applying the 1D frame-to-frame approach 5 to the chapel creates the dis- jointed mosaic in Figure 2b. For a 2D scan, achieving continuity requires a 2D mosaicking method that aligns spatially overlapping frames, including those not temporally adjacent. In other words, we should apply slow-drying glue to all overlapping pairs and move the frames around to align everything reasonably well (see Figure 1c). Researchers in photogrammetry and computer vision have long rec- ognized that we can deﬁne the optimal mosaic by min- imizing the total alignment error in overlap regions simultaneously with respect to the placement parame- 0272-1716/02/$17.00 © 2002 IEEE Image-Based Modeling, Rendering, and Lighting 44 March/April 2002 We present a complete approach for automated construction of mosaics from images and video using topology inference, local and global alignment, and compositing. Steve Hsu, Harpreet S. Sawhney, and Rakesh Kumar Sarnoff Corporation Automated Mosaics via Topology Inference