3D GAIT ESTIMATION FROM MONOSCOPIC VIDEO ABSTRACT This paper presents a new approach for 3D gait esti- mation from monocular image sequences, using both a kinematics and a walking motion models as sources of prior knowledge. The proposed technique consists of two major stages. Firstly, the motion trajectory and the pedestrian’s footprints are detected throughout the seg- mented video sequence. Secondly, as the 3D human model, driven by the prior motion model, walks over this trajectory, the joints’ angles are locally adjusted to the pedestrian’s walking style. This tuning process is per- formed once per walking cycle and not per frame, saving considerable CPU time. In addition, local tuning allows handling displacements at different speeds or directions. The target application is the augmentation of 2D televi- sion sequences with depth information that may be used in future 3D-TV systems. 1. INTRODUCTION 3D-TV opens a new and attractive field of applications, from more realistic movies to interactive environments. However, in order to fully exploit these new 3D-TV sys- tems all the existing 2D video material should be converted into 3D. Theoretically, it is not possible to completely re- cover 3D information from 2D video sequences when no other extra information is given or can be estimated. Since television sequences are populated with objects with known structure and motion such as humans, cars, etc, pri- or knowledge would arguably aid the recovery of the scene. Prior knowledge in the form of kinematics con- straints (average size of an articulated structure, degrees of freedom (DOFs) for each articulation), or motion dynamics (physical laws ruling the objects’ movements), is a com- monplace solution to handle the aforementioned problem. In real world conditions, 3D human motion modeling using monocular image sequences constitutes a complex and challenging problem, which involves difficulties such as: self-occlusions, depth ambiguities of the body parts, walking direction estimation, erroneous background seg- mentation, etc. (see [1] for more details). In order to avoid some of the aforementioned problems, 3D human walking modeling has been usually tackled by making simplifying assumptions (e.g. [2], [3], [4]) or by imposing constraints on the motion (e.g. walking in a plane orthogonal to the camera with a constant speed [5], [6]). Moreover, in order to register the projection of the com- puted 3D model with the given image, several features have been combined [7], such as skin color, edges, skele- ton, optical flow, etc. The proposed approach consists in dividing the given walking sequence into separate walking cycles, which are independently processed. An explicit motion model, defined by a set of motion curves driving each articulation, is used as initial approximation of the motion. These curves, obtained from anthropometric studies [6], are indi- vidually tuned by the algorithm according to the walking attitude of each pedestrian (Fig. 1). The main advantage comparing with previous approaches is that matching between the projection of the 3D model and the image fea- tures is performed once per walking cycle and not per frame. A brief description of the 3D body modeling, together with depth estimation is given below. The pro- posed technique is presented in section 4 and section 5. Section 6 shows experimental results and finally conclu- sions and future work are introduced in section 7. 2. 3D BODY MODELING In the current work, similarly than in [8], an articulated structure defined by 16 links (superquadrics) and 22 DOF, 4 for each arm and leg and 6 for the torso (3 for orientation and 3 for position) was chosen (Fig. 2(left)). However, in order to reduce the complexity, it was assumed that while walking, the legs’ and arms’ movements are contained in parallel planes and that the body’s orientation is always or- This work has been carried out as part of the ATTEST project (Advanced Three-dimensional TElevision System Technologies, IST- 2001-34396). The first author has been supported by The Ramón y Cajal Program. Angel D. Sappa Niki Aifanti Sotiris Malassiotis Michael G. Strintzis Computer Vision Center Informatics & Telematics Institute Edifici O, Campus UAB 1st Km Thermi-Panorama Road 08193 Bellaterra - Barcelona, Spain Thermi-Thessaloniki, Greece angel.sappa@cvc.uab.es {naif, malasiot}@iti.gr strintzi@eng.auth.gr