Directional Space-Time Oriented Gradients for 3D Visual Pattern Analysis Ehsan Norouznezhad 1,2 , Mehrtash T. Harandi 1,2 , Abbas Bigdeli 1,2 , Mahsa Baktash 1,2 , Adam Postula 1,2 , and Brian C. Lovell 1,2 1 NICTA, P.O. Box 6020, St. Lucia, QLD 4067, Australia 2 The University of Queensland, School of ITEE, QLD 4072, Australia Abstract. Various visual tasks such as the recognition of human actions, ges- tures, facial expressions, and classiﬁcation of dynamic textures require modeling and the representation of spatio-temporal information. In this paper, we propose representing space-time patterns using directional spatio-temporal oriented gra- dients. In the proposed approach, a 3D video patch is represented by a histogram of oriented gradients over nine symmetric spatio-temporal planes. Video com- parison is achieved through a positive deﬁnite similarity kernel that is learnt by multiple kernel learning. A rich spatio-temporal descriptor with a simple trade-off between discriminatory power and invariance properties is thereby obtained. To evaluate the proposed approach, we consider three challenging visual recognition tasks, namely the classiﬁcation of dynamic textures, human gestures and human actions. Our evaluations indicate that the proposed approach attains signiﬁcant classiﬁcation improvements in recognition accuracy in comparison to state-of- the-art methods such as LBP-TOP, 3D-SIFT, HOG3D, tensor canonical correla- tion analysis, and dynamical fractal analysis. 1 Introduction The goal of visual pattern recognition is to detect the presence of a particular object or pattern in a given image or video. This usually involves representing patterns in a suit- able feature space to achieve robustness against a broad range of environmental changes like photometric variations, occlusions, background clutter, geometric transformations, and variations in view angle. The study of space-time patterns such as human actions, gestures, facial expressions, and dynamic textures has attracted growing attention, mainly due to the wide range of applications in real world [1]. One of the major theme of research in space-time pattern classiﬁcation is shaped around devising robust spatio-temporal local descriptors [1,2]. Broadly speaking, spatio-temporal local descriptors can be categorized into three main classes. The ﬁrst and the largest category includes the direct extension of 2D local descriptors to 3D. The underlying idea is to replace the rectangular regions in 2D descriptors with 3D patches and recast the functions/procedures from the spatial domain (ie. (x, y)) into the spatio-temporal space (ie. (x, y, t)). Examples include 3D-SIFT by Scovanner et al. [3], HOG3D by Klaser et al. [4], Volume Local Binary Patterns (VLBP) by Zhao et al. [5] and Extended-SURF (ESURF) by Willems et al. [6]. A. Fitzgibbon et al. (Eds.): ECCV 2012, Part III, LNCS 7574, pp. 736–749, 2012. c  Springer-Verlag Berlin Heidelberg 2012