IMAGE CONTENT-BASED ACTIVE SENSOR PLANNING FOR A MOBILE TRINOCULAR ACTIVE VISION SYSTEM Aly A. Farag and Alaa E. Abdel-Hakim Computer Vision and Image Processing Laboratory University of Louisville, Louisville, KY 40292 E-mail: {farag, alaa }@cvip.uofl.edu http://www.cvip.uofl.edu ABSTRACT In this paper, we present a sensor planning approach for a mobile trinocular active vision system. At the stationary state (i.e., no motion) the sensor planning system calculates the generalized cameras' parameters (i.e., translational distance from the center, zoom, focus and vergence) using deterministic geometric specifications of both the sensors and the objects in their field of view. Some of these geometric parameters are difficult to be predetermined for the mobile system operation. In this paper, a new sensor planning approach, based on processing the content of the captured images, is proposed. The approach uses a combination of a closed-form solution for the translation between the three cameras, the vergence angle of the cameras as well as zoom and focus settings with the results of the correspondences between the acquired images and a predefined target object(s) obtained using the SIFT algorithm. We demonstrate the accuracy of the new approach using practical experiments. 1. INTRODUCTION A trinocular vision system for 3D reconstruction (CardEye) has been developed by our research team [1]. Instead of using just two cameras as in the known stereo vision systems, CardEye uses three cameras to improve the recovery process and make it more robust. The sensor planning in CardEye aims to determine generalized camera parameters such as position, orientation and optical settings such that the object features are within the field of view and in focus. As a stationary 3D reconstruction system, it is supplied with some information about the object to be scanned and the working conditions. Specifically, the radius of the virtual sphere that contains the object and the distance between the center of that sphere and the cameras should be known before starting the sensor planning process. Then, sensor planning can be performed using those parameters combined with the geometrical specifications of the system. Thus, the generated parameters are accurately geometrically calculated. In this paper, the stationary trinocular active vision system is developed to be mounted on a mobile robot. The function of the stationary system is extended to not only reconstruct 3D objects in well-known positions, but also to fetch specific target object(s) in the robot’s navigation environment, then reconstruct the 3D model. The mobility nature of the proposed system makes the dynamic system feeding with the distance between the cameras and the center of the target object extremely difficult or impossible in many cases. Hence, the conventional geometrical sensor planning approach for the stationary system fails. Therefore, in this paper, we present a new sensor planning approach for the mobile system based on detection of a target object in the robot’s navigation environment with utilizing the geometric specifications of the system. The proposed approach discards the distance between the center of the virtual sphere containing the object and the cameras as an input parameter. First, we use the Scale Invariant Feature Transform (SIFT) [2] for detecting the target object. Then, the cameras’ parameters are determined such that the number of the correspondent features of the target object in the three images is maximum. The SIFT approach transforms an image into a large collection of local feature vectors (descriptors). Those SIFT descriptors are robust to image translation, scaling, rotation and partial occlusion and partially invariant to illumination and affine projection. Therefore, the SIFT approach is extremely adequate for the proposed mobile system. After detecting the target object, or part of it, in one image or more of the three cameras, the geometric information of the system is employed in conjunction with the SIFT results for the system sensor planning with the purpose of maximizing the number of features in the field of view. 1.1. Related work A number of different vision planning systems have been developed in the past years that use prior information about the observed object and applied sensors to automatically generate sensor parameters that satisfy different vision constraints [3]. The difference between those techniques is in the approach used to determine sensor parameter values. The following techniques are mostly applicable to vision systems that observe known objects in known positions. For example, visual inspection, surveillance, monitoring systems, or accurate 3D model reconstruction systems [1]. Several systems use a generate-and-test approach [3], where sensor positions and settings are chosen and tested to meet the requirements of the task. For active vision systems, a single sensor configuration