Correcting Pose Estimation with Implicit Occlusion Detection and Rectification Ibrahim Radwan 1 Abhinav Dhall 2 Roland Goecke 1,2 1 University of Canberra , 2 Australian National University ibrahim.radwan@canberra.edu.au, abhinav.dhall@anu.edu.au, roland.goecke@ieee.org Abstract Recently, articulated pose estimation methods based on the pictorial structure framework have received much attention in computer vision. However, the per- formance of these approaches has been limited due to the presence of self-occlusion. This paper deals with the problem of handling self-occlusion in the pictorial structure framework. We propose an exemplar-based framework for implicit occlusion detection and rectifi- cation. Our framework can be applied as a general post-processing plug-in following any pose estimation approach to rectify errors due to self-occlusion and to improve the accuracy. The proposed framework outper- forms a state-of-the-art pictorial structure approach for human pose estimation on the HumanEva dataset. 1 Introduction Articulated pose estimation based on the Pictorial Structure (PS) framework has attracted much attention in developing a large variety of applications such as automotive safety, surveillance, pose search and video indexing. PS models represent an object as a graph, where each node represents a body part and edges be- tween nodes encode the kinematic constraints between connected pairs of parts. Significant progress has been achieved [1, 8, 16], but highly articulated objects (e.g. human bodies) lead to many self-occluded parts, re- sulting in less accurate pose estimation and detection. There are two types of occlusions: 1) Self-occlusion caused by the object itself due large degrees of free- dom, different camera views or different poses; 2) Inter- occlusion between different objects in the same image. In this paper, we focus on the former and propose a ro- bust exemplar-based framework to rectify the human pose estimation in highly self-occluded scenes. The proposed method is inspired by the success of Bag-of- Visual-Words approaches in pose estimation and action recognition. However, we use the entire pose as the vi- sual word and the document at the same time. We clus- ter the training exemplars and compose a codebook for the key poses. In each entry in the codebook, we store the corresponding occluded parts. This paper’s contributions are solutions to the fol- lowing three key questions: 1) How can we detect oc- clusion in a given image? 2) If there is occlusion, how can we identify the (body) parts responsible for that oc- clusion? 3) How can we rectify the occluded part’s posi- tion? To this end, we introduce (1) a general framework for self-occlusion detection, which reduces the search space of occluded parts, and (2) an approach for recti- fying PS parameters of occluded parts in highly articu- lated poses that can work with any PS model, making it more robust to self-occlusion and allowing us to accu- rately estimate the pose from monocular images. The occlusion detector is based on two binary dis- criminative non-linear SVM classifiers to detect the oc- cluded parts in the upper and lower body regions. It- eratively, we match the input pose with the codebook exemplars and select the nearest neighbour. The corre- sponding occluded parts are replaced iteratively to rec- tify the PS parameters. Our proposed rectification step is based on matching the PS parameters and the ground truth from labelled training exemplars. 2 Related Work The Pictorial Structure idea dates back to Fischler et al. [10]. Felzenszwalb et al. [8] presented the de- formation cost for the PS framework, which relied on a simple appearance model and required background sub- traction. The limitation of this method is its inaccuracy in the presence of cluttered and dynamic backgrounds. Andriluka et al. [1] overcame this problem by using a discriminative appearance model. They interpreted the normalised margin of each part as the appearance likelihood for that part. Although this produced a gen- eral framework for both object detection and articulated