1412 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 34, NO. 3, JUNE 2004
Articulated Pose Identification With
Sparse Point Features
Baihua Li, Qinggang Meng, and Horst Holstein
Abstract—We propose a general algorithm for identifying an
arbitrary pose of an articulated subject with sparse point features.
The algorithm aims to identify a one-to-one correspondence
between a model point-set and an observed point-set taken from
freeform motion of the articulated subject. We avoid common
assumptions such as pose similarity or small motions with respect
to the model, and assume no prior knowledge from which to infer
an initial or partial correspondence between the two point-sets.
The algorithm integrates local segment-based correspondences
under a set of affine transformations, and a global hierarchical
search strategy. Experimental results, based on synthetic pose and
real-world human motion data demonstrate the ability of the algo-
rithm to perform the identification task. Reliability is increasingly
compromised with increasing data noise and segmental distortion,
but the algorithm can tolerate moderate levels. This work con-
tributes to establishing a crucial self-initializing identification in
model-based point-feature tracking for articulated motion.
Index Terms—Articulated point pattern matching, motion
tracking and object recognition, nonrigid pose estimation.
I. INTRODUCTION
I
N COMPUTER vision research, motion analysis and object
recognition have been largely restricted to rigid objects.
However, in the real world, nonrigid motion of objects is
the general rule. Tracking and identifying nonrigid motion,
ranging from articulated and elastic motion to fluid motion
[2] has drawn growing attention in the past decade, motivated
by potential applications such as human-machine interaction,
biomedical studies, molecular biology and computational
chemistry, the entertainment industry, and more recently, in
robot monitoring and control.
The nonrigid motion we are considering describes segment-
based articulated jointed motion, such as occurs in skeletal bio-
logical motion. The motion of each segment can be considered
as rigid or nearly rigid, but the whole motion is high-dimension-
ally nonrigid. When such articulated motion is represented by a
sequence of feature points, the spatio-temporal information of
the articulated motion is notably reduced to only a sequence of
moving points over time. Johansson’s moving light displays [15]
demonstrated that human vision can perceive articulated struc-
ture and motion solely from a small number of moving dots. Un-
fortunately, identifying these points to recognize the underlying
structure and articulated motion in the real-world are inherently
difficult for a machine. Most existing algorithms, for instance in
Manuscript received February 13, 2003; revised September 13, 2003. This
paper was recommended by Associate Editor X. Jiang.
The authors are with the Department of Computer Science, Univer-
sity of Wales, Aberystwyth, SY23 3DB, U.K. (e-mail: bal@aber.ac.uk;
qqm@aber.ac.uk; hoh@aber.ac.uk).
Digital Object Identifier 10.1109/TSMCB.2004.825914
the field of “looking at people” [1], [21], have been designed to
deal with problems such as human body model acquisition [16],
three-dimensional (3-D) motion reconstruction from multiple
views [12], two-dimensional (2-D)/3-D model-based tracking,
pose estimation and recognition [9], [14], [30] using richer in-
formation from the usual domain of color or intensity images.
However, there is a relative dearth of literature on articulated
motion reconstruction from only sparse point features.
In this study, we concentrate on the identification task to ad-
dress the problem of self-initializing model matching in point-
feature tracking. Therefore, our algorithm assumes availability
of feature point motion data that might be obtained by various
methods and sensors, such as the 3-D data used in our exper-
iments, obtained from a marker-based optical motion capture
system. The articulated object to be monitored is a priori known.
Therefore, the self-initializing identification problem can be for-
mulated as point pattern matching (PPM) of a pre-known “stick-
figure” model of an articulated object to its related motion data.
Fitting the individual model to its motion data is the routine
identification task addressed here.
II. RELATED WORKS IN PPM AND MOTION ESTIMATION
In vision analysis, object and/or motion recognition based
on feature point identification and/or tracking is commonly en-
countered in a wide variety of disciplines and applications [8],
[19]. Of the fundamental tasks in model-based point-feature
tracking and recognition systems, tracking algorithms have been
investigated extensively, based on assumptions such as smooth
or small inter-frame motion, or high-level knowledge related to
a specific motion [8], [9], [24], [25]. However identification,
to establish which point in an observed data frame corresponds
to which point in its model and thereby to reconstruct the em-
bedded pose and structure, remains an open problem, especially
at the start or recommencement of tracking. Currently, most
tracking approaches simplify the problem to incremental pose
estimation, relying on an assumption of initial pose similarity to
the model, or on manual initialization between the model pose
and the first frame of each motion sequence.
Numerous techniques relevant to PPM, such as geometric
hashing [29] and alignment, image registration using dense
point-sets [6], [20], have been studied within a rich literature.
Many of them have focused on rigid [5], [10], [17], [22], [28],
approximate affine or perspective transformations relevant to
point correspondence for the purposes of pose estimation and
object recognition [6], [7], [20], [26]. These methods are based
on geometric invariance or constraint satisfaction embedded in
affine transformations, yielding approximate matches among
1083-4419/04$20.00 © 2004 IEEE