Cross-view Activity Recognition using Hankelets
Binlong Li, Octavia I. Camps and Mario Sznaier
∗
Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115
http://robustsystems.ece.neu.edu
Abstract
Human activity recognition is central to many practical
applications, ranging from visual surveillance to gaming
interfacing. Most approaches addressing this problem are
based on localized spatio-temporal features that can vary
significantly when the viewpoint changes. As a result, their
performances rapidly deteriorate as the difference between
the viewpoints of the training and testing data increases. In
this paper, we introduce a new type of feature, the “Han-
kelet” that captures dynamic properties of short tracklets.
While Hankelets do not carry any spatial information, they
bring invariant properties to changes in viewpoint that al-
low for robust cross-view activity recognition, i.e. when
actions are recognized using a classifier trained on data
from a different viewpoint. Our experiments on the IXMAS
dataset show that using Hanklets improves the state of the
art performance by over 20%.
1. Introduction
Recognition of actions in video is central to many ap-
plications, including visual surveillance, assisted living for
the elderly, and human computer interfaces [1, 4, 17, 28]. A
significant portion of the most recent work in activity recog-
nition [5, 19, 20, 12] has been inspired by the success of
using bag of features (BoF) approaches for object recogni-
tion. Other approaches are based on time-series using tra-
jectories or a combination of local features and trajectories
[27, 23, 29, 14]. While these approaches are quite success-
ful in recognizing actions captured from similar viewpoints,
their performance suffer as the viewpoint changes due to
the inherent view dependence of the features used by these
methods.
In contrast, there is a smaller body of work address-
ing the problem of multi-view action recognition. Some
of these approaches rely on geometric constraints [32],
body joints detection and tracking [22, 21], and 3D mod-
∗
This work was supported in part by NSF grants IIS–0713003 and
ECCS–0901433, AFOSR grant FA9550–09–1–0253, and the Alert DHS
Center of Excellence under Award Number 2008-ST-061-ED0001.
els [30, 31, 8, 15]. More recent approaches transfer fea-
tures across views [16, 7] or use self-similarities as quasi-
view invariant features [10, 11]. However, the performances
for these approaches are still far below the performances
achieved for single view activity recognition.
1.1. Paper Contributions
In this paper, we propose Hankelets – the Hankel ma-
trix of a short tracklet – as a new feature to use with a BoF
approach to recognize activities across different viewpoints.
Hankelets provide an alternative representation for activities
that carries viewpoint invariance by capturing their dynam-
ics instead of simple spatial gradient information. They are
easy to extract and do not require camera calibration, 3D
models, body joint detection, persistent tracking or spatial
feature matching. Because building a codebook of Han-
kelets requires comparisons of millions of these features,
we also propose a simple and fast to compute dissimilar-
ity score that can be used for this purpose. We tested the
proposed approach with the IXMAS dataset [30] and our
experiments show a performance improvement of 20% over
the state of the art. A somewhat similar approach using
bags of dynamic systems was proposed in [24] for view-
invariant dynamic texture recognition. However, their ap-
proach used dense cubes of pixels, required nonlinear di-
mensionality reduction, system identification and solving a
Lyapunov equation. In contrast, our approach uses track-
lets, does not require system identification or prior knowl-
edge of the dynamics involved and only requires computing
matrix traces.
The paper is organized as follows. Section 2 gives a brief
summary of background material on dynamical systems and
Hankel matrices. Section 3 gives the details of the proposed
approach and section 4 discusses experimental results com-
paring the proposed approach against previously reported
results. Finally, section 5 gives final remarks.
2. Background: Hankel Matrices
Dynamic systems have been recently used in a wide
range of computer vision applications, including dynamic
texture recognition, target tracking, and activity recogni-
978-1-4673-1228-8/12/$31.00 ©2012 IEEE 1362