Indoor Positioning within a Single Camera and 3D Maps Xun Li, Jinling Wang, Aire Olesk, Nathan Knight, Weidong Ding School of Surveying and Spatial Information Systems University of New South Wales Sydney, Australia xun.li@student.unsw.edu.au Abstract— In this paper, we propose a method of vision-based positioning with the use of single camera and newly defined 3D maps for indoor localization and navigation purposes. Our work here is to address the accuracy and reliability concerns of an indoor navigation system. The main contribution will be the adoption of photogrammetric 6DOF pose estimation method to improve the positioning accuracy. DOPs are introduced to evaluate positioning precision within vision-based domain. Quality control strategies are also applied to detect outliers in the observation and strengthen system reliability. Besides, only natural landmarks are required in the proposed method to provide absolute position and orientation information. Keywords- indoor positioning; single camera; 3D map; accuracy; reliability I. INTRODUCTION Navigation and localization services have never been so accessible nowadays. Most of them depend on traditional satellite-based positioning techniques and therefore can only be provided in outdoor environments (with direct GNSS visibility). In an effort to extend such high level positioning capability to indoor environments, much work has been done. One of the primary concerns is to find alternative positioning techniques that could provide localization and navigation services both accurate and reliable in an indoor environment. Some of the approaches use RF, ultrasonic, vision, infrared IR and so forth [1-3]. Vision is an effective method to find the 3D position of a target and also attractive among these sensors. They are self-contained as no external infrastructure (e.g. beacons, radio stations) is required. Furthermore, it provides absolute location information without accumulating errors. In fact, with all these virtues, vision-based navigation has begun to attract attention since late last century. Up today countless research contribution has been made. To our knowledge, most of related work has come from the mobile robot community, which has been the most active research domain concerned. Depending on the exploitation of one or more cameras, either map-based or mapless navigation is performed [4]. The common approach adopted has been using the images captured by its on-board vision sensor to determine the position and possibly the orientation of a vision system. Much emphasis has been placed on enabling a robot safely and effectively navigates in an indoor environment with a high level of autonomy. However, as long as the navigation performs without failure (hitting any obstacle or unable to reach its destination), self-localization (or positioning) process is considered as satisfactory. The accuracy and reliability aspect of positioning has hardly been paid much attention, or fully investigated. This is especially true with regard to monocular vision approaches. In [5] the authors used the difference of current image and pre-recorded image sequence to continuously estimate the robot’s position and orientation shifts. While the orientation change can be obtained with relatively high accuracy, position change may not be accurately estimated. Another limitation with this approach is that it is based on the assumption that the correspondence between the current image and the reference image has always been found correctly, leaving mismatches a severe danger jeopardizing the reliability of the whole system. Two years later, Ohya, Kosaka and Kak utilized the matching of edges from currently obtained images and 3D edge model to achieve self-localization [6]. A step further from previous the attempt is that they used a predetermined threshold to prevent the position error from going outrages, yet with the help of dead-reckoning method. Visual input alone fails to provide accurate positioning results. In 2003, a new algorithm for image-based robot navigation was proposed [7]. The core idea is to generate the translation and rotation shift in robot movement by matching the target image with images taken in real time. While this idea makes a good point, another contribution of their approach is that RANSAC paradigm is used to deal with outliers caused by mismatches. However, it is not without its limitations. The algorithm is only able to provide three degree of freedom, and in several cases not enough correct matches can be found to compute the position shift. The limitation in the degrees of freedom can also be found in some model-based approaches [8]. Mathematic models have been developed to improve the accuracy and reliability of the system by fusing odometry information. A recent study by Hayashi and Kinoshita [9] developed an indoor navigation system based mainly on visual input from a monocular camera and a 2D space map. Self-localization was achieved by calculating the relative position of the robot itself and straight lines recognized on both sides of the corridor, and this work didn’t give explicit information on the accuracy or reliability this method can get. It can be seen that available approaches are still far from mature to provide a robust indoor positioning and navigation solution, in a sense that a full degree of freedom should be provided with a high level of accuracy and reliability.