1412 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 34, NO. 3, JUNE 2004 Articulated Pose Identification With Sparse Point Features Baihua Li, Qinggang Meng, and Horst Holstein Abstract—We propose a general algorithm for identifying an arbitrary pose of an articulated subject with sparse point features. The algorithm aims to identify a one-to-one correspondence between a model point-set and an observed point-set taken from freeform motion of the articulated subject. We avoid common assumptions such as pose similarity or small motions with respect to the model, and assume no prior knowledge from which to infer an initial or partial correspondence between the two point-sets. The algorithm integrates local segment-based correspondences under a set of affine transformations, and a global hierarchical search strategy. Experimental results, based on synthetic pose and real-world human motion data demonstrate the ability of the algo- rithm to perform the identification task. Reliability is increasingly compromised with increasing data noise and segmental distortion, but the algorithm can tolerate moderate levels. This work con- tributes to establishing a crucial self-initializing identification in model-based point-feature tracking for articulated motion. Index Terms—Articulated point pattern matching, motion tracking and object recognition, nonrigid pose estimation. I. INTRODUCTION I N COMPUTER vision research, motion analysis and object recognition have been largely restricted to rigid objects. However, in the real world, nonrigid motion of objects is the general rule. Tracking and identifying nonrigid motion, ranging from articulated and elastic motion to fluid motion [2] has drawn growing attention in the past decade, motivated by potential applications such as human-machine interaction, biomedical studies, molecular biology and computational chemistry, the entertainment industry, and more recently, in robot monitoring and control. The nonrigid motion we are considering describes segment- based articulated jointed motion, such as occurs in skeletal bio- logical motion. The motion of each segment can be considered as rigid or nearly rigid, but the whole motion is high-dimension- ally nonrigid. When such articulated motion is represented by a sequence of feature points, the spatio-temporal information of the articulated motion is notably reduced to only a sequence of moving points over time. Johansson’s moving light displays [15] demonstrated that human vision can perceive articulated struc- ture and motion solely from a small number of moving dots. Un- fortunately, identifying these points to recognize the underlying structure and articulated motion in the real-world are inherently difficult for a machine. Most existing algorithms, for instance in Manuscript received February 13, 2003; revised September 13, 2003. This paper was recommended by Associate Editor X. Jiang. The authors are with the Department of Computer Science, Univer- sity of Wales, Aberystwyth, SY23 3DB, U.K. (e-mail: bal@aber.ac.uk; qqm@aber.ac.uk; hoh@aber.ac.uk). Digital Object Identifier 10.1109/TSMCB.2004.825914 the field of “looking at people” [1], [21], have been designed to deal with problems such as human body model acquisition [16], three-dimensional (3-D) motion reconstruction from multiple views [12], two-dimensional (2-D)/3-D model-based tracking, pose estimation and recognition [9], [14], [30] using richer in- formation from the usual domain of color or intensity images. However, there is a relative dearth of literature on articulated motion reconstruction from only sparse point features. In this study, we concentrate on the identification task to ad- dress the problem of self-initializing model matching in point- feature tracking. Therefore, our algorithm assumes availability of feature point motion data that might be obtained by various methods and sensors, such as the 3-D data used in our exper- iments, obtained from a marker-based optical motion capture system. The articulated object to be monitored is a priori known. Therefore, the self-initializing identification problem can be for- mulated as point pattern matching (PPM) of a pre-known “stick- figure” model of an articulated object to its related motion data. Fitting the individual model to its motion data is the routine identification task addressed here. II. RELATED WORKS IN PPM AND MOTION ESTIMATION In vision analysis, object and/or motion recognition based on feature point identification and/or tracking is commonly en- countered in a wide variety of disciplines and applications [8], [19]. Of the fundamental tasks in model-based point-feature tracking and recognition systems, tracking algorithms have been investigated extensively, based on assumptions such as smooth or small inter-frame motion, or high-level knowledge related to a specific motion [8], [9], [24], [25]. However identification, to establish which point in an observed data frame corresponds to which point in its model and thereby to reconstruct the em- bedded pose and structure, remains an open problem, especially at the start or recommencement of tracking. Currently, most tracking approaches simplify the problem to incremental pose estimation, relying on an assumption of initial pose similarity to the model, or on manual initialization between the model pose and the first frame of each motion sequence. Numerous techniques relevant to PPM, such as geometric hashing [29] and alignment, image registration using dense point-sets [6], [20], have been studied within a rich literature. Many of them have focused on rigid [5], [10], [17], [22], [28], approximate affine or perspective transformations relevant to point correspondence for the purposes of pose estimation and object recognition [6], [7], [20], [26]. These methods are based on geometric invariance or constraint satisfaction embedded in affine transformations, yielding approximate matches among 1083-4419/04$20.00 © 2004 IEEE