ROUGH COMPRESSED DOMAIN CAMERA POSE ESTIMATION THROUGH OBJECT
MOTION
Christian K¨ as, Henri Nicolas
LaBRI, University of Bordeaux 1, 351 cours de la Lib´ eration, 33405 Talence Cedex, France
{kaes,nicolas}@labri.fr
ABSTRACT
We present an unsupervised method to estimate the camera
orientation angle on monocular video scenes in the H.264
compressed domain. The method is based on the presence
of moving objects in the scene. We start by estimating the
global camera motion based on the motion vectors present in
the stream, detect and track moving objects and estimate their
relative distance to the camera by analyzing the temporal evo-
lution of the objects’ dimensions. The evolution of the motion
compensated, vertical positions of key points within moving
objects are used to infer the extrinsic orientation angle of the
camera.
Index Terms— Compressed domain, camera pose esti-
mation, object distance estimation
1. INTRODUCTION
Fully automated analysis of multimedia content is an interest-
ing and challenging research area. Regarding the enormous
amount of available content, indexing and analysis algorithms
are required to be fast and robust. A common approach is
to re-use motion information already present in compressed
video streams to save processing time. Many of the existing
approaches use this motion information to segment the scene
in foreground and background and to track moving objects.
In the pixel domain, a number of multi-view and single
view algorithms for estimating the objects’ distance have been
presented. Examples for multi-view approaches are given in
[1, 2]. An incomplete list of single camera approaches in-
cludes [3, 4, 5, 6], where defocus [4, 6] or object size [5]
are used as depth indicators. Another approach is provided
by Rosales [7] who applies extended Kalman filtering to re-
construct the relative 3D trajectories. The mentioned work
relies on pixel domain features and can not be adapted to the
compressed domain. Mbonye [8] uses MPEG-2 compressed
domain data to adjust the camera pose by attentive visual ser-
voing tailored to a road traffic application. In the present
This work has been carried out in the context of the french national
project ICOS-HD (ANR-06-MDCA-010-03) funded by the Agence Na-
tionale de la Recherche (ANR).
article, we go one step further and exploit single-view com-
pressed domain tracking results to infer the objects’ relative
distance and the orientation angle of the camera, with no a
priori knowledge of the scene setup.
The suite of this article is organized as follows. We
present the different stages of our method, starting with the
segmentation and tracking of moving objects in Sec. 2. Cer-
tain object properties are further processed to estimate the
relative distance to the camera in Sec. 3, followed by the
estimation of the camera angle in Sec. 4. The results of each
stage are provided within the respective section.
2. OBJECT EXTRACTION AND TRACKING
The detection of moving objects is based on the motion vec-
tors (MVs) associated with B- or P-slice macro blocks in the
H.264 stream. In order to extract them, only the entropy cod-
ing has to be reversed. As a first processing step, we estimate
camera motion by an iterative re-weighted least-squares fit-
ting of the 6-parameter affine motion model. Output are the 6
model parameters a
1
...a
6
and outlier masks of all MVs that
do not follow the global motion. These outlier masks mainly
correspond to moving objects, but are also subject to noise if
large, low-textured areas or non-static background appears. In
order to alleviate the impact of these effects, spatio-temporal
filtering along the MV trajectories is performed. The filtered
outlier masks give a rough segmentation of the scene in back-
ground and foreground objects. In the frame-wise detection
Fig. 1. From left to right: Screenshot - raw outlier mask -
filtered mask with detected objects
stage, we consider each connected region in the filtered mask
image as one object. We then calculate and store certain prop-
erties of these objects, namely the i) size, ii) orientation, iii)
local motion, iv) width and height along the principal axes,
v) center of gravity and vi) top- and the bottom position. The
3481 978-1-4244-5654-3/09/$26.00 ©2009 IEEE ICIP 2009