Selecting Local Region Descriptors with a Genetic Algorithm for Real-World Place Recognition Leonardo Trujillo 1 , Gustavo Olague 1 , Francisco Fern´ andez de Vega 2 , and Evelyne Lutton 3 1 EvoVisi´ on Project, CICESE Research Center, Ensenada, B.C. M´ exico 2 Grupo de Evoluci´ on Artificial, Universidad de Extremadura, M´ erida, Spain 3 APIS Team, INRIA-Futurs, Parc Orsay Universit´ e 4, ORSAY Cedex, France trujillo@cicese.mx,olague@cicese.mx,fcofdez@unex.es, evelyne.lutton@inria.fr Abstract. The basic problem for a mobile vision system is determining where it is located within the world. In this paper, a recognition system is presented that is capable of identifying known places such as rooms and corridors. The system relies on a bag of features approach using locally prominent image regions. Real- world locations are modeled using a mixture of Gaussians representation, thus allowing for a multimodal scene characterization. Local regions are represented by a set of 108 statistical descriptors computed from different modes of infor- mation. From this set the system needs to determine which subset of descriptors captures regularities between image regions of the same location, and also dis- criminates between regions of different places. A genetic algorithm is used to solve this selection task, using a fitness measure that promotes: 1) a high clas- sification accuracy; 2) the selection of a minimal subset of descriptors; and 3) a high separation among place models. The approach is tested on two real world examples: a) using a sequence of still images with 4 different locations; and b) a sequence that contains 8 different locations. Results confirm the ability of the system to identify previously seen places in a real-world setting. 1 Introduction Building an artificial system that is capable of answering the question “Where am I?” is one of the central problems studied in computer vision research. This task has only been partially solved in constrained real-world situations. To solve an instance of this problem, and many vision problems in general, three design issues must be accounted for [1] : 1) What information should be extracted from the output of visual sensors?, 2) How is the information extracted?, 3) How should the information be represented?, 4) How will it be used to solve higher-level tasks? This contribution introduces a system that performs place recognition using only lo- cal image information and probabilistic models for each location. The design questions stated above are addressed in the following manner. Questions 1 and 4 are answered using common computer vision techniques that are applicable to different types of problems. The extracted information are local image regions, what corresponds to a bags of fea- tures approach [2,3,4]. In this way, the system can be robust to occlusions and avoids the need for prior segmentation. The information gathered from the images is used to create M. Giacobini et al. (Eds.): EvoWorkshops 2008, LNCS 4974, pp. 325–334, 2008. c Springer-Verlag Berlin Heidelberg 2008