Cross-view Activity Recognition using Hankelets Binlong Li, Octavia I. Camps and Mario Sznaier Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115 http://robustsystems.ece.neu.edu Abstract Human activity recognition is central to many practical applications, ranging from visual surveillance to gaming interfacing. Most approaches addressing this problem are based on localized spatio-temporal features that can vary significantly when the viewpoint changes. As a result, their performances rapidly deteriorate as the difference between the viewpoints of the training and testing data increases. In this paper, we introduce a new type of feature, the “Han- kelet” that captures dynamic properties of short tracklets. While Hankelets do not carry any spatial information, they bring invariant properties to changes in viewpoint that al- low for robust cross-view activity recognition, i.e. when actions are recognized using a classifier trained on data from a different viewpoint. Our experiments on the IXMAS dataset show that using Hanklets improves the state of the art performance by over 20%. 1. Introduction Recognition of actions in video is central to many ap- plications, including visual surveillance, assisted living for the elderly, and human computer interfaces [1, 4, 17, 28]. A significant portion of the most recent work in activity recog- nition [5, 19, 20, 12] has been inspired by the success of using bag of features (BoF) approaches for object recogni- tion. Other approaches are based on time-series using tra- jectories or a combination of local features and trajectories [27, 23, 29, 14]. While these approaches are quite success- ful in recognizing actions captured from similar viewpoints, their performance suffer as the viewpoint changes due to the inherent view dependence of the features used by these methods. In contrast, there is a smaller body of work address- ing the problem of multi-view action recognition. Some of these approaches rely on geometric constraints [32], body joints detection and tracking [22, 21], and 3D mod- This work was supported in part by NSF grants IIS–0713003 and ECCS–0901433, AFOSR grant FA9550–09–1–0253, and the Alert DHS Center of Excellence under Award Number 2008-ST-061-ED0001. els [30, 31, 8, 15]. More recent approaches transfer fea- tures across views [16, 7] or use self-similarities as quasi- view invariant features [10, 11]. However, the performances for these approaches are still far below the performances achieved for single view activity recognition. 1.1. Paper Contributions In this paper, we propose Hankelets – the Hankel ma- trix of a short tracklet – as a new feature to use with a BoF approach to recognize activities across different viewpoints. Hankelets provide an alternative representation for activities that carries viewpoint invariance by capturing their dynam- ics instead of simple spatial gradient information. They are easy to extract and do not require camera calibration, 3D models, body joint detection, persistent tracking or spatial feature matching. Because building a codebook of Han- kelets requires comparisons of millions of these features, we also propose a simple and fast to compute dissimilar- ity score that can be used for this purpose. We tested the proposed approach with the IXMAS dataset [30] and our experiments show a performance improvement of 20% over the state of the art. A somewhat similar approach using bags of dynamic systems was proposed in [24] for view- invariant dynamic texture recognition. However, their ap- proach used dense cubes of pixels, required nonlinear di- mensionality reduction, system identification and solving a Lyapunov equation. In contrast, our approach uses track- lets, does not require system identification or prior knowl- edge of the dynamics involved and only requires computing matrix traces. The paper is organized as follows. Section 2 gives a brief summary of background material on dynamical systems and Hankel matrices. Section 3 gives the details of the proposed approach and section 4 discusses experimental results com- paring the proposed approach against previously reported results. Finally, section 5 gives final remarks. 2. Background: Hankel Matrices Dynamic systems have been recently used in a wide range of computer vision applications, including dynamic texture recognition, target tracking, and activity recogni- 978-1-4673-1228-8/12/$31.00 ©2012 IEEE 1362