A Visual Positioning System for Vehicle Navigation Huei-Yung Lin, Jen-Hung Lin and Ming-Liang Wang Abstract— Localization of a vehicle is a key component for driving assistance or autonomous navigation. In this work, we propose a visual positioning system (VPS) for vehicle or mobile robot navigation. Different from general landmark-based or model-based approaches, which rely on some predeﬁned known landmarks or a priori information about the environment, no assumptions on the prior knowledge of the scene are made. A stereo-based vision system is built for both extracting feature correspondences and recovering 3-D information of the scene from image sequences. Relative positions of the camera motion are then estimated by registering the 3-D feature points from two consecutive image frames. Localization of the vehicle is ﬁnally given by the reference to its initial position. I. INTRODUCTION Localization and tracking of a vehicle are major compo- nents for providing positions, directions and travel informa- tion to the driver. They also serve as key technologies for building autonomous navigation systems. Many approaches have been proposed for locating a mobile robot or a ve- hicle based on various techniques. The most commonly used methods include dead-reckoning techniques, navigation using active beacons, landmark-based navigation, map-based navigation, global positioning system (GPS), and vision- based positioning. Dead-reckoning is a procedure for determining the present location of a vehicle by advancing some previous position through known path and velocity information over a given period of time. Since an odometer and optical sensors for wheel direction detection are easily installed on a vehicle, dead-reckoning is usually a less expensive method for vehicle localization. However, the integration of incremental motion information over time will lead to accumulation of errors. In the case of long distance travel, accumulation of orientation error will make position errors diverge as the increase of driving time. Active beacons such as laser, sonar or radio can be used as media for vehicle or mobile robot navigation. These approaches use triangulation to measure the distance between a number of beacons and the mobile platform and then deter- mine the current location. The problems associated with this technique are the inaccuracy of the distance measurement caused by time-delay of the signals, and the installation and maintenance cost of a large number of beacons required for an area. Satellite based differential global positioning system (DGPS) is probably the most advanced and accurate method for identifying the position and orientation of an object. However, DGPS does not work well if the satellite signals Huei-Yung Lin, Jen-Hung Lin and Ming-Liang Wang are with the Department of Electrical Engineering, National Chung Cheng University, Chia-Yi, 621, Taiwan. E-mail: lin@ee.ccu.edu.tw are blocked. This situation commonly happens in the indoor environments, or urban areas with tall buildings, etc. In this work, we propose a visual positioning system (VPS) for vehicle or mobile robot navigation. Different from general landmark-based or model-based approaches which rely on some predeﬁned known landmarks or a priori infor- mation about the environment (e.g., 3-D models of buildings, objects, etc.), no assumptions on the prior knowledge of the scene are made. A stereo-based vision system is built for both extracting feature correspondences and recovering 3- D information of the scene from image sequences. A robust feature tracking method based on simultaneously considering the inter-frame images from the same camera and the stereo image pair from different cameras at a ﬁxed time instant is developed. The relative position of the camera motion is then estimated by registering the 3-D feature points from two consecutive image frames. Our system includes the following basic modules: Feature point extraction: Automatically detect the feature points for each image. Feature correspondence matching: Establish the feature correspondences for both inter-frame and inter-camera images. Position estimation: Estimate the camera location based on the correspondences of the stereo image sequences. Position reﬁnement: Derive a more accurate location infor- mation based on the sequence of location estimates and additional constraints. II. FEATURE POINT EXTRACTION Given a sequence of images captured by a video camera, the ﬁrst step of 3-D scene analysis consists of selecting candidate features in one or more images for tracking or matching them across different views. Generally speaking, there are two important criteria for feature point selection. First, the features corresponding to the same scene points should be extracted consistently over time (i.e., the different views in the sequence of images). Second, there should be enough information in the neighborhood of the points so that the corresponding points can be automatically matched. In the past few decades, a great deal of work on feature extraction has been done, and several approaches have been reported in the literature [11], [3], [6]. In this research, Harris corner detector [6] is used to extract the feature points. The main idea is to threshold the value R(x, y) = det(C) − k trace 2 (C) (1) where k is a (usually small) number used to control the gradient variation in different directions, and the matrix C