AUTOMATIC HUMAN MOTION ANALYSIS AND ACTION RECOGNITION IN ATHLETICS VIDEOS Costas Panagiotakis, Ilias Grinias, and Georgios Tziritas Department of Computer Science, University of Crete P.O. Box 2208,71409, Heraklion, Greece phone: + (30) 2810 393517, fax: + (30) 2810 393501, email: {cpanag, grinias, tziritas}@csd.uoc.gr ABSTRACT We present an unsupervised, automatic human motion analy- sis and action recognition scheme tested on athletics videos. First, four major human points are recognized and tracked us- ing human silhouettes that are computed by a robust camera estimation and object localization method. Statistical analy- sis of the tracking points motion obtains a temporal segmen- tation on running and jump stage. The method is tested on athletics videos of pole vault, high jump, triple jump and long jump recognizing them using robust and independent from the camera motion and the athlete performance features. The experimental results indicate the good performance of the proposed scheme, even in sequences with complicated con- tent and motion. 1. INTRODUCTION Human motion analysis using computer vision techniques has many applications in many areas, such as analysis of athletic events, surveillance, entertainment, user interfaces, content-based image storage and retrieval. These systems at- tempt to detect, track and identify people and recognize their action given a number of predefined actions. Thus, there has been a significant number of recent papers on human track- ing and activity recognition. We can classify these systems into different categories, according to the input data, the as- sumptions adopted, the method used and the output. Wang, Hu and Tan [10] emphasize on three major issues of human motion analysis systems, namely human detection, tracking and activity understanding. According to them, there are 2D, with or without explicit shape models, and 3D approaches. First, we consider 2-D approaches. Wang et al. [11] pro- pose a method to recognize and track a walker using 2D hu- man model and both static and dynamic cues of body biomet- rics. Moreover, many systems use Shape-From-Silhouette methods to detect and track the human in 2D [6] or 3D space [2]. The silhouettes are easy to extract providing valuable in- formation about the position and shape of the person. When the camera is static, background subtraction techniques can give high accuracy measures of human silhouettes. Other- wise, camera motion estimation methods [3] can locate the independently moving objects. Several approaches have been proposed recently in the literature for detecting video actions and activities using 2D or 3D motion captured data. Bodbick and Davis [1] use tem- poral templates strategy. They interpret human motion in an image sequence by using motion-energy (MEI) and motion history images (MHI). Mori et al. [4] use 3D motion data and associate each action with a distinct feature detector and HMM, followed by hierarchical recognition. In [5], the ac- tion recognition is performed using a probabilistic context- (a) (b) (c) (d) Fig. 1: (a) Low quality silhouette: low accuracy human boundary, the silhouette could be partitioned to several segments and several objects could be appeared. (b) Estimated human points: head center (green point), mass center (magenta point), left end of leg (blue point) and right end of leg (brown point). The human body major axis is shown as a red dashed line. (c) The four major human points. (d) The two characteristics angles: the human major axis angle (A 1 ) and the angle between legs (A 34 ). free grammar (PCFG) based on an automatic keyframe se- lection process. Most of them consider simple classes like running, walk- ing and standing using as input video sequences from static camera and controlled environments. Thus, they obtain high accuracy measurements about human silhouettes and high performance results. A challenging problem appears when the camera is moving and the estimated human silhouettes are of low quality or extremely wrong (see Fig. 1(a)). In this work we focus on automatic human detection, tracking and action recognition under real and dynamic environments of athletic meetings. We suppose that the camera tracks the athlete and we the test algorithm in sports like pole vault, high jump, triple jump and long jump. Furthermore, our method works when other humans appear in the scene. The main contribution of the method is that it works automat- ically without any initialization or prior knowledge about camera motion and human parameters, providing also sta- tistical results about athlete motion. Moreover, the proposed, robust and independent from the camera motion and the ath- lete performance features, obtain a high performance action recognition method. 1.1 System Overview The proposed architecture consists of two main modules. First, four major human points are recognized and tracked using the precomputed human silhouettes. Silhouettes are computed using a general purpose algorithm for detecting and localizing the moving objects of videos. In general, the 14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP