Hierarchical Human Action Recognition by Normalized-Polar Histogram
Maryam Ziaeefard
Electrical Engineering Department
Sahand University of Technology
Tabriz, Iran
m_ziaeefard@sut.ac.ir
Hossein Ebrahimnezhad
Electrical Engineering Department
Sahand University of Technology
Tabriz, Iran
ebrahimnezhad@sut.ac.ir
Abstract—This paper proposes a novel human action
recognition approach which represents each video sequence by
a cumulative skeletonized images (called CSI) in one action
cycle. Normalized-polar histogram corresponding to each CSI
is computed. That is the number of pixels in CSI which is
located in the certain distance and angles of the normalized
circle. Using hierarchical classification in two levels, human
action is recognized. In first level, course classification is
performed with whole bins of histogram. In the second level,
the more similar actions are examined again employing the
special bins and the fine classification is completed. We use
linear multi-class SVM as the classifier in two steps. Real
human action dataset, Weizmann, is selected for evaluation.
The resulting average recognition rate of the proposed method
is 97.6%.
Keywords- Human Action Recognition; skeletonized image;
SVM; Normalized Polar Histogram; feature selection
I. INTRODUCTION
Recognizing human action has been in the center of
attention in the last decade in many computer vision and
pattern recognition communities. The demand for human
action recognition is significantly growing in many aspects
such as video surveillance, human-computer interface,
monitoring of patients or old people, and etc. Appropriate
and efficient feature extraction is one trouble that is not fully
solved yet.
In literature, there are many existing research works on
human action recognition and motion analysis. State of the
art survey has been reported in [1]. The different existing
research can be divided in two major categories in general:
Space-time approaches and sequential approaches. The first
one is subdivided into trajectories, space-time volume,
space-time feature and the second one is classified into data-
based and statement based methods. Space–time based
approach has become more popular for human action
recognition in the last decade. Trajectory-based approaches
are the recognition approaches present an activity as a set of
space-time trajectories. In these methods, a person is
interpreted as a set of points corresponding to joint positions.
As a human performs an action, change of joint position are
recorded as space-time trajectories and construct the
representation of the action [2]. A space-time volume
constructed by concatenating an image sequence describes
the shape and appearance changes of a person during an
activity execution. Similar to the space-time trajectory-based
approaches, human activities are recognized by matching a
space-time volume of an input sequence with the activity
models. Recognition is performed with measuring similarity
between two volumes. Instead of concatenating entire
images along time, some approaches only use silhouette to
track shape changes. They consider an input video as feature
vectors, and imply that an activity has occurred in the video
if they are able to observe a particular sequence
characterizing the activity [3]. Space-time features are
approaches using local features extracted from 3-dimensional
space-time volumes to represent and recognize the activities.
The system is able to recognize an activity by solving an
object matching or object recognition problem [4]. Data-
based approaches represent human activities by maintaining
a template sequence or a set of sample sequences of action
executions. When a new input video is given, the data-based
approaches compare the sequence of feature vectors
extracted from the video with the template sequence. If their
similarity be high enough, the system will be able to deduce
that the given input contains an execution of the activity [5].
The state based approach is another popular approach for
human action recognition. An action is modeled as a set of
states in the state space using a Dynamic Probabilistic
Network (DPN). Hidden Markov Model (HMM) is the most
commonly used DPN has the advantages in modeling the
time varying feature data [6].
In this paper, it is assumed that human motion has
periodic nature and so the features are extracted during one
period for each action. In each frame, the skeletonized model
is obtained. The frame with the minimum distance from first
frame is labeled and subsequently motion period is
determined. In a period of action, all frames are concatenated
to one image through their center of gravity. We call the
resultant image cumulative skeletonized image or CSI. Then
a bivariate histogram is computed for each CSI. The axes of
histogram are r and θ which are the normalized distance and
angle of CSI’s pixels from center, respectively. This
histogram is defined as the motion pattern for each action.
The motivation behind using the histogram is that the bins
which are occupied in polar histogram with reference to the
center of person are different in dissimilar actions and so
discriminate actions can be simply recognized using this
information. Finally, human action is classified with
hierarchical multi-class classifier using salient features.
The remainder of this paper is organized as follows.
Section 2 describes the details of feature extraction. Section
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.906
3708
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.906
3724
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.906
3720
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.906
3720
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.906
3720