FlowCap: 2D Human Pose from Optical Flow Javier Romero, Matthew Loper, Michael J. Black Max Planck Institute for Intelligent Systems, T¨ ubingen, Germany a b c d Fig. 1: FlowCap overview. a. Example frame from a video sequence shot with a phone camera. b. Optical ﬂow computed with GPU ﬂow [1]. c. Per-pixel part assignments based on ﬂow with overlaid uncertainty ellipses (red). d. Predicted 2D part centroids connected in a tree. Abstract. We estimate 2D human pose from video using only optical ﬂow. The key insight is that dense optical ﬂow can provide information about 2D body pose. Like range data, ﬂow is largely invariant to ap- pearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense ﬂow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no op- tical ﬂow and they eﬀectively disappear. To address this, our FlowCap method uses a Kalman ﬁlter to propagate body part positions and ve- locities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical ﬂow. The results suggest that optical ﬂow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation. 1 Introduction Human pose estimation from monocular video has been extensively studied but currently there are no widely available, general, eﬃcient, and reliable solutions. The problem is challenging due to the dimensionality of articulated human pose, the complexity of human motion, and the variability of human appearance in images due to clothing, lighting, camera view, and self occlusion. There has been extensive work on 2D human pose estimation using part-based models [8, 11,