154 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006
An Augmented-Reality-Based Real-Time Panoramic
Vision System for Autonomous Navigation
Sumantra Dasgupta and Amarnath Banerjee, Member,IEEE
Abstract—The paper discusses a panoramic vision system for
autonomous-navigation purposes. It describes an economic PC-
based method for integrating data from multiple camera sources
in real time. The views from adjacent cameras are visualized
together as a panorama of the scene using a modified correla-
tion-based stitching algorithm. A separate operator is presented
with a particular slice of the panorama matching the user’s view-
ing direction. Additionally, a simulated environment is created
where the operator can choose to augment the video by simul-
taneously viewing an artificial three-dimensional (3-D) view of
the scene. Potential applications of this system include enhancing
quality and range of visual cues, and navigation under hostile
circumstances where direct view of the environment is not possible
or desirable.
Index Terms—Augmented reality, autonomous navigation,
panoramic stitching, real-time video.
I. I NTRODUCTION
P
ANORAMIC images [4] are often used in augmented
reality [1], digital photography, and autonomous naviga-
tion. There are various techniques for the creation of panoramic
images, as discussed in [3], [7], and [13]. Based on the amount
of time taken to render the panorama after the actual pic-
tures have been taken, one can categorize panoramic imaging
as real time and offline. Applications such as autonomous
navigation need real-time imaging, whereas applications such
as virtual walkthrough can work based on offline panoramic
imaging. Panoramic imaging can also be categorized based
on the capture devices used. One can either use: 1) a static
(nonrotating) configuration of multiple cameras, as those used
for multibaseline stereo in [5] and [10]; 2) a revolving single
camera, as that in [2]; or 3) a combination of cameras and
mirrors, generally known as catadioptrics, as those in [6], [8],
[9], and [12]. In the first case, the final presentation of the
panorama is real time because images from various cameras
can be simultaneously captured at 30 ft/s, which leaves time
for postprocessing (mainly stitching) and still meet the real-
time constraints. In the second case, the camera needs time
to revolve, which makes the effective capture rate fall dras-
tically below real-time requirements. In the third method, the
panorama of the scene is captured by reflection from the mirror
element and presented to the camera capturing frames at rates
Manuscript received December 1, 2004; revised May 14, 2005. This work
was supported in part by NASA under Grant NAG9-1442. This paper was
recommended by Associate Editor M. Zhou.
The authors are with the Department of Industrial Engineering, Texas
A&M University (TAMU), College Station, TX 77843 USA (e-mail:
sumantrad@gmail.com; banerjee@tamu.edu).
Digital Object Identifier 10.1109/TSMCA.2006.859177
of around 30 ft/s. Therefore, the presentation of the image can
be made real-time after postprocessing (here, postprocessing
is mainly that of mapping curvilinear coordinates to linear
coordinates).
We develop a method for integrating data from multiple cam-
era sources in real time. Our main effort has been to maintain a
high-resolution color (RGB) video panorama and yet maintain
a high frame rate. The camera setup is static with respect to the
base to which it is attached. Movement is manifested in the form
of motion of the base (e.g., cameras attached to an automobile).
This is referred to as direct motion of camera setup. Movement
can also be manifested in the form of head movement of a
user wearing a head-mounted display (HMD). In this case, the
cameras do not move, but the user is able to see in the direction
to which he/she is looking. This is referred to as indirect motion.
Here, we consider a scenario where a user wearing an HMD
sits within a moving vehicle to which the cameras are rigidly
attached.
The procedure starts with the video sequence from various
cameras being stitched together to form a panoramic image.
Then, the image is rendered according to the viewpoint of the
user (or indirect motion) and the orientation of the van (direct
motion). This is referred to as windowing, and the rendered
view is referred to as the window view. Both the stitching and
the windowing algorithms are done in real time. The window
view of the panoramic image is presented to the user for view-
ing. For separate monitoring purposes, the whole panoramic
image sequence is also fed to a separate video display with
an on-screen video control panel to a separate operator. An
advanced form of rendering/viewing is done by superposing
the real-time video onto a three-dimensional (3-D) model of the
view area using chroma keying. This adds a level of redundancy
and extra security for navigation purposes.
For the present approach, a multiple-camera configuration
has been used because the configuration is smoothly scalable in
both horizontal and vertical directions. Moreover, the configu-
ration can be easily extended for stereo viewing. In the present
approach, multiple cameras are arranged in a static octagonal
setup. The method is described in the following steps.
II. CAMERA CONFIGURATION
For horizontal panoramic viewing, the cameras can be
arranged in a regular horizontal setup (Fig. 1). For vertical
viewing, the cameras can be stacked up (Fig. 2). For all-
round viewing, the cameras can be arranged in a regular 3-D
arrangement (Fig. 3). For all these setups, there will be blind
spots very close to the camera setup (precisely at any distance
1083-4427/$20.00 © 2006 IEEE