FlowCap: 2D Human Pose from Optical Flow Javier Romero, Matthew Loper, Michael J. Black Max Planck Institute for Intelligent Systems, T¨ ubingen, Germany a b c d Fig. 1: FlowCap overview. a. Example frame from a video sequence shot with a phone camera. b. Optical flow computed with GPU flow [1]. c. Per-pixel part assignments based on flow with overlaid uncertainty ellipses (red). d. Predicted 2D part centroids connected in a tree. Abstract. We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to ap- pearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no op- tical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and ve- locities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation. 1 Introduction Human pose estimation from monocular video has been extensively studied but currently there are no widely available, general, efficient, and reliable solutions. The problem is challenging due to the dimensionality of articulated human pose, the complexity of human motion, and the variability of human appearance in images due to clothing, lighting, camera view, and self occlusion. There has been extensive work on 2D human pose estimation using part-based models [8, 11,