Robust 3D Human Pose Estimation
Guided by Filtered Subsets of Body Keypoints
Alexandros Makris
FORTH
amakris@ics.forth.gr
Antonis Argyros
FORTH, University of Crete
argyros@ics.forth.gr
Abstract
We propose a novel hybrid human 3D body pose es-
timation method that uses RGBD input. The method
relies on a deep neural network to get an initial 2D
body pose. Using depth information from the sensor,
a set of 2D landmarks on the body are transformed
in 3D. Then, a multiple hypothesis tracker uses the
obtained 2D and 3D body landmarks to estimate the
3D body pose. In order to safeguard from observation
errors, each human pose hypothesis considered by the
tracker is constructed using a gradient descent opti-
mization scheme that is applied to a subset of the body
landmarks. Landmark selection is driven by a set of
geometric constraints and temporal continuity criteria.
The resulting 3D poses are evaluated by an objective
function that calculates densely the discrepancy between
the 3D structure of the rendered 3D human body model
and the actual depth observed by the sensor. The quan-
titative experiments show the advantages of the pro-
posed method over a baseline that directly uses all land-
mark observations for the optimization, as well as over
other recent 3D human pose estimation approaches.
1 Introduction
Vision-based human motion capture is an essential
problem with many applications. Markerless unobtru-
sive methods have received a lot of attention from the
computer vision community and considerable progress
has already been achieved. However, accurate, fast and
robust 3D human pose estimation in the wild is still an
open problem.
1.1 Related Work
Human body pose estimation techniques may be
classified into three broad classes, the bottom-up dis-
criminative methods, the top-down generative methods
and the hybrid ones. Generative methods can be very
accurate, provide physically plausible solutions and do
not require training. However, typically, they are com-
putationally demanding, require initialization and can
suffer from drift and track loss. Discriminative meth-
ods perform single frame pose estimation and do not
require initialization. On the other hand, they rely on
big collections of annotated training data and their so-
lution is not always physically plausible. Hybrid meth-
Figure 1. At each frame, the proposed method
takes as input the previous pose hypotheses H
t-1
,
the 2D landmarks (green discs) extracted from
the RGB image and the corresponding 3D land-
marks (blue discs) calculated using depth. It then
generates a set of hypotheses H
t
for the current
frame. For each hypothesis a different subset of
the detected landmarks is used (red discs). The
best hypothesis is selected by densely measuring
its discrepancy from the observed depth.
ods integrate elements from both worlds in an effort to
combine their merits.
Most recent human pose estimation methods rely
on 2D keypoints extracted from RGB data [1, 2]. The
accuracy of these methods is high, mainly due to the
availability of large annotated datasets [3,4]. By build-
ing on the 2D keypoints and relying on RGB informa-
tion only, many recent approaches perform either 2D
pose estimation [1] or 3D pose estimation [5–9]. To
tackle the difficulties of lifting 2D keypoints to 3D,
some methods directly regress 3D keypoints or volu-
metric representations [10]. Recent approaches proceed
further to estimate both the pose and the shape of the
human body [11–14]. In [15], they establish dense cor-
respondences between images and the 3D human body
model. The approaches that rely on RGB information
only, either produce a scale normalized output or rely
on prior assumptions to determine the models’ scale. In
both cases, their applicability in a number of domains
(e.g., robotics) is limited.
To recover the full 3D human body pose in a
real world coordinate frame, most approaches rely on
RGBD sensors. The work in [16] that relies on ran-
16th International Conference on Machine Vision Applications (MVA)
National Olympics Memorial Youth Center, Tokyo, Japan, May 27-31, 2019.
© 2019 MVA Organization
01-04