Robust 3D Human Pose Estimation Guided by Filtered Subsets of Body Keypoints Alexandros Makris FORTH amakris@ics.forth.gr Antonis Argyros FORTH, University of Crete argyros@ics.forth.gr Abstract We propose a novel hybrid human 3D body pose es- timation method that uses RGBD input. The method relies on a deep neural network to get an initial 2D body pose. Using depth information from the sensor, a set of 2D landmarks on the body are transformed in 3D. Then, a multiple hypothesis tracker uses the obtained 2D and 3D body landmarks to estimate the 3D body pose. In order to safeguard from observation errors, each human pose hypothesis considered by the tracker is constructed using a gradient descent opti- mization scheme that is applied to a subset of the body landmarks. Landmark selection is driven by a set of geometric constraints and temporal continuity criteria. The resulting 3D poses are evaluated by an objective function that calculates densely the discrepancy between the 3D structure of the rendered 3D human body model and the actual depth observed by the sensor. The quan- titative experiments show the advantages of the pro- posed method over a baseline that directly uses all land- mark observations for the optimization, as well as over other recent 3D human pose estimation approaches. 1 Introduction Vision-based human motion capture is an essential problem with many applications. Markerless unobtru- sive methods have received a lot of attention from the computer vision community and considerable progress has already been achieved. However, accurate, fast and robust 3D human pose estimation in the wild is still an open problem. 1.1 Related Work Human body pose estimation techniques may be classified into three broad classes, the bottom-up dis- criminative methods, the top-down generative methods and the hybrid ones. Generative methods can be very accurate, provide physically plausible solutions and do not require training. However, typically, they are com- putationally demanding, require initialization and can suffer from drift and track loss. Discriminative meth- ods perform single frame pose estimation and do not require initialization. On the other hand, they rely on big collections of annotated training data and their so- lution is not always physically plausible. Hybrid meth- Figure 1. At each frame, the proposed method takes as input the previous pose hypotheses H t-1 , the 2D landmarks (green discs) extracted from the RGB image and the corresponding 3D land- marks (blue discs) calculated using depth. It then generates a set of hypotheses H t for the current frame. For each hypothesis a different subset of the detected landmarks is used (red discs). The best hypothesis is selected by densely measuring its discrepancy from the observed depth. ods integrate elements from both worlds in an effort to combine their merits. Most recent human pose estimation methods rely on 2D keypoints extracted from RGB data [1, 2]. The accuracy of these methods is high, mainly due to the availability of large annotated datasets [3,4]. By build- ing on the 2D keypoints and relying on RGB informa- tion only, many recent approaches perform either 2D pose estimation [1] or 3D pose estimation [5–9]. To tackle the difficulties of lifting 2D keypoints to 3D, some methods directly regress 3D keypoints or volu- metric representations [10]. Recent approaches proceed further to estimate both the pose and the shape of the human body [11–14]. In [15], they establish dense cor- respondences between images and the 3D human body model. The approaches that rely on RGB information only, either produce a scale normalized output or rely on prior assumptions to determine the models’ scale. In both cases, their applicability in a number of domains (e.g., robotics) is limited. To recover the full 3D human body pose in a real world coordinate frame, most approaches rely on RGBD sensors. The work in [16] that relies on ran- 16th International Conference on Machine Vision Applications (MVA) National Olympics Memorial Youth Center, Tokyo, Japan, May 27-31, 2019. © 2019 MVA Organization 01-04