Hierarchical Human Action Recognition by Normalized-Polar Histogram Maryam Ziaeefard Electrical Engineering Department Sahand University of Technology Tabriz, Iran m_ziaeefard@sut.ac.ir Hossein Ebrahimnezhad Electrical Engineering Department Sahand University of Technology Tabriz, Iran ebrahimnezhad@sut.ac.ir Abstract—This paper proposes a novel human action recognition approach which represents each video sequence by a cumulative skeletonized images (called CSI) in one action cycle. Normalized-polar histogram corresponding to each CSI is computed. That is the number of pixels in CSI which is located in the certain distance and angles of the normalized circle. Using hierarchical classification in two levels, human action is recognized. In first level, course classification is performed with whole bins of histogram. In the second level, the more similar actions are examined again employing the special bins and the fine classification is completed. We use linear multi-class SVM as the classifier in two steps. Real human action dataset, Weizmann, is selected for evaluation. The resulting average recognition rate of the proposed method is 97.6%. Keywords- Human Action Recognition; skeletonized image; SVM; Normalized Polar Histogram; feature selection I. INTRODUCTION Recognizing human action has been in the center of attention in the last decade in many computer vision and pattern recognition communities. The demand for human action recognition is significantly growing in many aspects such as video surveillance, human-computer interface, monitoring of patients or old people, and etc. Appropriate and efficient feature extraction is one trouble that is not fully solved yet. In literature, there are many existing research works on human action recognition and motion analysis. State of the art survey has been reported in [1]. The different existing research can be divided in two major categories in general: Space-time approaches and sequential approaches. The first one is subdivided into trajectories, space-time volume, space-time feature and the second one is classified into data- based and statement based methods. Space–time based approach has become more popular for human action recognition in the last decade. Trajectory-based approaches are the recognition approaches present an activity as a set of space-time trajectories. In these methods, a person is interpreted as a set of points corresponding to joint positions. As a human performs an action, change of joint position are recorded as space-time trajectories and construct the representation of the action [2]. A space-time volume constructed by concatenating an image sequence describes the shape and appearance changes of a person during an activity execution. Similar to the space-time trajectory-based approaches, human activities are recognized by matching a space-time volume of an input sequence with the activity models. Recognition is performed with measuring similarity between two volumes. Instead of concatenating entire images along time, some approaches only use silhouette to track shape changes. They consider an input video as feature vectors, and imply that an activity has occurred in the video if they are able to observe a particular sequence characterizing the activity [3]. Space-time features are approaches using local features extracted from 3-dimensional space-time volumes to represent and recognize the activities. The system is able to recognize an activity by solving an object matching or object recognition problem [4]. Data- based approaches represent human activities by maintaining a template sequence or a set of sample sequences of action executions. When a new input video is given, the data-based approaches compare the sequence of feature vectors extracted from the video with the template sequence. If their similarity be high enough, the system will be able to deduce that the given input contains an execution of the activity [5]. The state based approach is another popular approach for human action recognition. An action is modeled as a set of states in the state space using a Dynamic Probabilistic Network (DPN). Hidden Markov Model (HMM) is the most commonly used DPN has the advantages in modeling the time varying feature data [6]. In this paper, it is assumed that human motion has periodic nature and so the features are extracted during one period for each action. In each frame, the skeletonized model is obtained. The frame with the minimum distance from first frame is labeled and subsequently motion period is determined. In a period of action, all frames are concatenated to one image through their center of gravity. We call the resultant image cumulative skeletonized image or CSI. Then a bivariate histogram is computed for each CSI. The axes of histogram are r and θ which are the normalized distance and angle of CSI’s pixels from center, respectively. This histogram is defined as the motion pattern for each action. The motivation behind using the histogram is that the bins which are occupied in polar histogram with reference to the center of person are different in dissimilar actions and so discriminate actions can be simply recognized using this information. Finally, human action is classified with hierarchical multi-class classifier using salient features. The remainder of this paper is organized as follows. Section 2 describes the details of feature extraction. Section 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.906 3708 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.906 3724 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.906 3720 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.906 3720 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.906 3720