ROUGH COMPRESSED DOMAIN CAMERA POSE ESTIMATION THROUGH OBJECT MOTION Christian K¨ as, Henri Nicolas LaBRI, University of Bordeaux 1, 351 cours de la Lib´ eration, 33405 Talence Cedex, France {kaes,nicolas}@labri.fr ABSTRACT We present an unsupervised method to estimate the camera orientation angle on monocular video scenes in the H.264 compressed domain. The method is based on the presence of moving objects in the scene. We start by estimating the global camera motion based on the motion vectors present in the stream, detect and track moving objects and estimate their relative distance to the camera by analyzing the temporal evo- lution of the objects’ dimensions. The evolution of the motion compensated, vertical positions of key points within moving objects are used to infer the extrinsic orientation angle of the camera. Index Terms— Compressed domain, camera pose esti- mation, object distance estimation 1. INTRODUCTION Fully automated analysis of multimedia content is an interest- ing and challenging research area. Regarding the enormous amount of available content, indexing and analysis algorithms are required to be fast and robust. A common approach is to re-use motion information already present in compressed video streams to save processing time. Many of the existing approaches use this motion information to segment the scene in foreground and background and to track moving objects. In the pixel domain, a number of multi-view and single view algorithms for estimating the objects’ distance have been presented. Examples for multi-view approaches are given in [1, 2]. An incomplete list of single camera approaches in- cludes [3, 4, 5, 6], where defocus [4, 6] or object size [5] are used as depth indicators. Another approach is provided by Rosales [7] who applies extended Kalman ﬁltering to re- construct the relative 3D trajectories. The mentioned work relies on pixel domain features and can not be adapted to the compressed domain. Mbonye [8] uses MPEG-2 compressed domain data to adjust the camera pose by attentive visual ser- voing tailored to a road trafﬁc application. In the present This work has been carried out in the context of the french national project ICOS-HD (ANR-06-MDCA-010-03) funded by the Agence Na- tionale de la Recherche (ANR). article, we go one step further and exploit single-view com- pressed domain tracking results to infer the objects’ relative distance and the orientation angle of the camera, with no a priori knowledge of the scene setup. The suite of this article is organized as follows. We present the different stages of our method, starting with the segmentation and tracking of moving objects in Sec. 2. Cer- tain object properties are further processed to estimate the relative distance to the camera in Sec. 3, followed by the estimation of the camera angle in Sec. 4. The results of each stage are provided within the respective section. 2. OBJECT EXTRACTION AND TRACKING The detection of moving objects is based on the motion vec- tors (MVs) associated with B- or P-slice macro blocks in the H.264 stream. In order to extract them, only the entropy cod- ing has to be reversed. As a ﬁrst processing step, we estimate camera motion by an iterative re-weighted least-squares ﬁt- ting of the 6-parameter afﬁne motion model. Output are the 6 model parameters a 1 ...a 6 and outlier masks of all MVs that do not follow the global motion. These outlier masks mainly correspond to moving objects, but are also subject to noise if large, low-textured areas or non-static background appears. In order to alleviate the impact of these effects, spatio-temporal ﬁltering along the MV trajectories is performed. The ﬁltered outlier masks give a rough segmentation of the scene in back- ground and foreground objects. In the frame-wise detection Fig. 1. From left to right: Screenshot - raw outlier mask - ﬁltered mask with detected objects stage, we consider each connected region in the ﬁltered mask image as one object. We then calculate and store certain prop- erties of these objects, namely the i) size, ii) orientation, iii) local motion, iv) width and height along the principal axes, v) center of gravity and vi) top- and the bottom position. The 3481 978-1-4244-5654-3/09/$26.00 ©2009 IEEE ICIP 2009