CONSTRUCTING PANORAMIC VIEWS THROUGH FACIAL GAZE TRACKING Fadi Dornaika French Geographical Institute (IGN) 94165 Saint-Mand´ e, France Bogdan Raducanu * Computer Vision Center, UAB 08193 Bellaterra, Barcelona, Spain ABSTRACT This paper describes a human machine interaction applica- tion for building panoramic views easily and efficiently. The panoramas are not limited to the 1D problem (one axis of ro- tation). The viewing direction of the camera acquiring snap- shots is directly controlled by the tracked user’s gaze direction through a 3D face tracker. Natural face motions can be used to control local or remote camera in order to build panoramic views. The resulting system may find applications in online environment mapping as well as in video surveillance. The developed system was applied to map some indoor and out- door scenes. Index Terms— Panoramic view, Human Computer Inter- action, Face tracker 1. INTRODUCTION Building panoramic images is very useful for many appli- cations such as augmented and virtual reality, environment mapping, and video surveillance. There are mainly two ways to build a panoramic view. The first way is to use a wide field sensor such the omnidirectional and catadioptric sensors. Catadioptric sensors allow panoramic images to be captured without any camera motion. However, since a single sensor is typically used for the entire panorama, the resolution may be inadequate for many applications. The second way is to compose mosaics from individual high-resolution images ac- quired independently by one or more cameras [1, 2]. There are two major steps in image mosaicking: i) image registra- tion, and ii) image blending. Image registration determines the geometric transformations that align images to a mosaic. These transformations are 2D projective mappings (homogra- phies) when the camera rotates around its center of projection. Once images are aligned and warped, blending is needed to eliminate artifacts along image borders. Building image mo- saics with a pure camera rotation becomes a classical task [3]. However, controlling the camera viewing direction is done ei- ther manually (hand-held camera) or automatically by using a predefined sequence of pan and tilt angle values. While these schemes are well suited for the 1D problem (one axis * This work is supported by MEC Grant TIN2006-15308-C02, Ministerio de Educacin y Ciencia, Spain, and the Ramon y Cajal research program. of rotation), they become very difficult for the general case (capturing a large part of the whole viewing sphere). There- fore, in the general case, the interaction between the user and the viewed scene is very limited. If the camera view- ing direction is controlled online by the user’s gesture (e.g., the users’s hand or facial gaze acts as a 3D pointing device) then not only the components of panoramic view will be ac- quired through a human machine interaction fashion but also the panorama is directly controlled by the user in the sense that the panorama can be easily and rapidly updated. For ex- ample, if the scene is a meeting room, the number of indi- vidual snapshots (panorama components) can be high for the pitch angles associated with the subjects, and can be very low for a non-interesting region (e.g., the room ceiling). In this paper, we introduce a system capable of build- ing panoramic views that are not limited to the 1D problem through the use of tracked 3D face gaze. Recently, we have developed a real-time face and facial feature tracking method based on Online Appearance Models (OAMs) [4]. This paper shows that panoramic views can be easily con- structed using our face tracker [4]. The user tracked gaze con- tinuously controls the gaze of a robotics vision sensor used for acquiring several snapshots of the scene. The proposed scheme for estimating and tracking the 3D face pose are au- tomatic. Moreover, the used panorama builder is fully auto- matic [5]. The remainder of the paper is organized as fol- lows. Section 2 briefly describes the 3D face tracker based on monocular video sequence captured by a fixed camera. Sec- tion 3 describes the proposed automatic camera control for panoramic view construction. It presents some experimental results. 2. 3D FACE TRACKER In our study, we use the 3D face model Candide [6]. This 3D deformable wireframe model was first developed for the pur- pose of model-based image coding and computer animation. Besides its simplicity, this 3D model encapsulates facial ac- tions due, for instance, to facial expressions. At any time, this 3D deformable model can be described by the state vector: b = [θ x , θ y , θ z , t x , t y , t z , τ a T ] T (1) where: