Active/Dynamic Stereo for Navigation Enrico Grosso, Massimo Tistarelli and Giulio Sandini University of Genoa Department of Communication, Computer and Systems Science Integrated Laboratory for Advanced Robotics (LIRA- Lab) Via Opera Pia llA - 16145 Genoa, Italy Abstract. Stereo vision and motion analysis have been frequently used to infer scene structure and to control the movement of a mobile vehicle or a robot arm. Unfortunately, when considered separately, these methods present intrinsic difficulties and a simple fusion of the respective results has been proved to be insufficient in practice. The paper presents a cooperative schema in which the binocular dis- parity is computed for corresponding points in several stereo frames and it is used, together with optical flow, to compute the time-to-impact. The formulation of the problem takes into account translation of the stereo set- up and rotation of the cameras while tracking an environmental point and performing one degree of freedom active vergence control. Experiments on a stereo sequence from a real scene are presented and discussed. 1 Introduction Visual coordination of actions is essentially a real-time problem. It is more and more clear that a lot of complex operations can rely on reflexes to visual stimuli [Bro86]. For example closed loop visual control has been implemented at about video rate for obstacle detection and avoidance [FGMS90], target tracking [CGS91] and gross shape understanding [TK91]. In this paper we face the problem of "visual navigation". The main goal is to perform task-driven measurements of the scene, detecting corridors of free space along which the robot can safely navigate. The proposed cooperative schema uses binocular disparity, computed on several image pairs and over time. In the past the problem of fusing motion and stereo in mutually useful way has been faced by different researchers. Nevertheless, there is a great difference between the approaches where the results of the two modalities are considered separately (for instance using depth from stereo to compute motion parameters [Mut86]) and the rather different approach based upon more integrated relations (for instance the temporal derivative of disparity [WD86, LD88]). In the following we will explain how stereo disparity and image velocity are combined to obtain a 2 89 representation of the scene, suitable for visual navigation, which is either in terms of time-to-impact or relative-depth referred to the distance of the cameras from the fixation point. Only image-derived quantities are used except for the vergence angles of the cameras which could be actively controlled during the robot motion [OC90], and can be measured directly on the motors (with optical encoders). As a generalization of a previous work [TGS91] we consider also a rotational motion of the cameras around the vertical axes and we derive, from temporal correspondence of image points, the relative rotation of the stereo base-line. This rotation is then used to correct optical flow or relative depth. * This work has been partially funded by the Esprit projects P2502 VOILA and P3274 FIRST.