MVA'SO IAPR Workshop on Machine Vision Applications Nov. 28-30,1990, Tokyo MAP-DRIVEN IMAGE INTERPRETATION BY ASSOCIATIVE MODEL INDEXING Gianluca Foresti, Vittorio Murino, Carlo S. Regazzoni, Rodolfo Zunino Dept. of Biophysical and Electronic Engineering University of Genoa Via all'opera Pia 11 A, 1-1 6145 Genoa, Italy The problem of integrating territorial information within a multisensor vision system for autonomous-vehicle control is addressed. Environmental information is used to improve recognition results and to locate a vehicle's position in the coordinate reference frame of a map. To this end, a hypothesis-and-test search mechanism has been developed, which is based on an associative phase and a symbolic. In particular, an associative memory is first used to address the possible territorial area where the scene under examination may have been acquired. This guess is then verified by a symbolic recognition system using a model-driven strategy. The integration of multiple information sources is basic to obtain an accurate recognition of 3D outdoor scenes, especially when controlling an autonomous vehicle. In this paper, we address the problem of integrating territorial information into a multisensor vision system for autonomous driving. A set of synthetic images, representing significant viewframes reconstructed from an a-priori fixed route on a territorial map are first stored in an associative memory [9]. This process represents the training phase of the associative memory. Images acquired by a multisensor set-up are then processed by the associative memory in order to produce an estimation of the vehicle position inside the map reference frame. This strategy makes it possible to arrange the search space, in such a way as to avoid the search in the whole model space, thus obtaining a better computational performance. This initial guess gives a position estimation which is then verified by the recognition system by looking for objects associated with the viewframe. This process is performed at a high abstraction level, and consists in an expectation-driven search starting from symbolic object descriptions and using a version of a distributed blackboard system for recognition [4], where a module devoted to scene analysis has been inserted. The paper is organized into in four sections. Section I1 deals with a general formulation of the problem, pointing out the characteristic of the sensors employed . Section Ill contains a brief review of associative memory techniques, and Section IV contains a description of the model here employed and it reports preliminary results obtained on a set of real images and on the relatedterritorial map. terrain map are transformed so that they can be fused with data acquired with a TV-camera, . Then, the recognition process performedat the symbolic level is described. 21 cartographi virtual sensor A topologic map (TM) representing a scenario through which an autonomous vehicle can ride provides useful information to be used by a multisensor recognition system. Two intermediate steps have to be performed to obtain a representation of the information contained in the map that it can be compared directly with data provided by visual sensors: first, a 3D model of the environment must be obtained; then an observation model must be provided allowing the system to simulate the acquisition of data as similar as possible to those coming from the visual sensor. 22 3C) Model Starting from a digitized image, like the one in Fig. 2, it is possible to detect two kinds of basic primitives which can be used to describe the environment model: lines located at equal height, (i.e. the so called contour lines) and landmarks (i.e., significant patterns which can be recognized by the system (see Fig. 3)). Using processing techniques and reconstruction algorithms (whose descriptions go beyond the scope of this paper), it is possible to obtain from contour lines a 3D map in the form of a matrix, F(x.y)=Z, where Z is the height computed at point (x,y) of TM. The next step is to place landmarks on the 3D ground map. Landmarks are usually characterized by regular shapes (e.g.. as a first approximation level, houses can be represented as parallelepipeds). A generic landmark Li is associated with the matrix Li(x,y) = 2'. Then, a complete a-priori environment model, F** can be obtained through an appropriate transformation of F. This operation is called landmark positioning, and can be modeled as a transformation over the original ground map, considering its landmarks: In this way, one can obtain a representation of the information contained in the map by indicating the contour lines and some characteristic objects which can be observed during the mission. 23Theobsenratiimodel The environment model, F*# can be observed with a camera emulator C, which takes as input the coordinates of the viewpoint (xo,yo), the axis of the visual direction (2 , ) and a 2THE RVTERPRETATIONPROBLEM vector of the camera parameters P, (e.g., depth of field, focus, etc.). The camera emulator performs a perspective transformation which allows one to obtain a 2D view (F2D). This section deals with the general formulation of the recognition process. It is explained How data provided by C ((XO,~~), Zv, P) ------> F2D(x1,y')