154 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006 An Augmented-Reality-Based Real-Time Panoramic Vision System for Autonomous Navigation Sumantra Dasgupta and Amarnath Banerjee, Member,IEEE Abstract—The paper discusses a panoramic vision system for autonomous-navigation purposes. It describes an economic PC- based method for integrating data from multiple camera sources in real time. The views from adjacent cameras are visualized together as a panorama of the scene using a modiﬁed correla- tion-based stitching algorithm. A separate operator is presented with a particular slice of the panorama matching the user’s view- ing direction. Additionally, a simulated environment is created where the operator can choose to augment the video by simul- taneously viewing an artiﬁcial three-dimensional (3-D) view of the scene. Potential applications of this system include enhancing quality and range of visual cues, and navigation under hostile circumstances where direct view of the environment is not possible or desirable. Index Terms—Augmented reality, autonomous navigation, panoramic stitching, real-time video. I. I NTRODUCTION P ANORAMIC images [4] are often used in augmented reality [1], digital photography, and autonomous naviga- tion. There are various techniques for the creation of panoramic images, as discussed in [3], [7], and [13]. Based on the amount of time taken to render the panorama after the actual pic- tures have been taken, one can categorize panoramic imaging as real time and ofﬂine. Applications such as autonomous navigation need real-time imaging, whereas applications such as virtual walkthrough can work based on ofﬂine panoramic imaging. Panoramic imaging can also be categorized based on the capture devices used. One can either use: 1) a static (nonrotating) conﬁguration of multiple cameras, as those used for multibaseline stereo in [5] and [10]; 2) a revolving single camera, as that in [2]; or 3) a combination of cameras and mirrors, generally known as catadioptrics, as those in [6], [8], [9], and [12]. In the ﬁrst case, the ﬁnal presentation of the panorama is real time because images from various cameras can be simultaneously captured at 30 ft/s, which leaves time for postprocessing (mainly stitching) and still meet the real- time constraints. In the second case, the camera needs time to revolve, which makes the effective capture rate fall dras- tically below real-time requirements. In the third method, the panorama of the scene is captured by reﬂection from the mirror element and presented to the camera capturing frames at rates Manuscript received December 1, 2004; revised May 14, 2005. This work was supported in part by NASA under Grant NAG9-1442. This paper was recommended by Associate Editor M. Zhou. The authors are with the Department of Industrial Engineering, Texas A&M University (TAMU), College Station, TX 77843 USA (e-mail: sumantrad@gmail.com; banerjee@tamu.edu). Digital Object Identiﬁer 10.1109/TSMCA.2006.859177 of around 30 ft/s. Therefore, the presentation of the image can be made real-time after postprocessing (here, postprocessing is mainly that of mapping curvilinear coordinates to linear coordinates). We develop a method for integrating data from multiple cam- era sources in real time. Our main effort has been to maintain a high-resolution color (RGB) video panorama and yet maintain a high frame rate. The camera setup is static with respect to the base to which it is attached. Movement is manifested in the form of motion of the base (e.g., cameras attached to an automobile). This is referred to as direct motion of camera setup. Movement can also be manifested in the form of head movement of a user wearing a head-mounted display (HMD). In this case, the cameras do not move, but the user is able to see in the direction to which he/she is looking. This is referred to as indirect motion. Here, we consider a scenario where a user wearing an HMD sits within a moving vehicle to which the cameras are rigidly attached. The procedure starts with the video sequence from various cameras being stitched together to form a panoramic image. Then, the image is rendered according to the viewpoint of the user (or indirect motion) and the orientation of the van (direct motion). This is referred to as windowing, and the rendered view is referred to as the window view. Both the stitching and the windowing algorithms are done in real time. The window view of the panoramic image is presented to the user for view- ing. For separate monitoring purposes, the whole panoramic image sequence is also fed to a separate video display with an on-screen video control panel to a separate operator. An advanced form of rendering/viewing is done by superposing the real-time video onto a three-dimensional (3-D) model of the view area using chroma keying. This adds a level of redundancy and extra security for navigation purposes. For the present approach, a multiple-camera conﬁguration has been used because the conﬁguration is smoothly scalable in both horizontal and vertical directions. Moreover, the conﬁgu- ration can be easily extended for stereo viewing. In the present approach, multiple cameras are arranged in a static octagonal setup. The method is described in the following steps. II. CAMERA CONFIGURATION For horizontal panoramic viewing, the cameras can be arranged in a regular horizontal setup (Fig. 1). For vertical viewing, the cameras can be stacked up (Fig. 2). For all- round viewing, the cameras can be arranged in a regular 3-D arrangement (Fig. 3). For all these setups, there will be blind spots very close to the camera setup (precisely at any distance 1083-4427/$20.00 © 2006 IEEE