Investigating the Use of Space-Time Primitives to Understand Human Movements Damiano Malafronte 1 , Gaurvi Goyal 1 , Alessia Vignolo 1,2 , Francesca Odone 1 , and Nicoletta Noceti 1(B ) 1 Universit`a degli Studi di Genova, Genova, Italy {damiano.malafronte,gaurvi.goyal}@dibris.unige.it, alessia.vignolo@iit.it, {francesca.odone,nicoletta.noceti}@unige.it 2 Istituto Italiano di Tecnologia, Genova, Italy Abstract. In this work we start investigating the use of appropriately learnt space-time primitives for modeling upper body human actions. As a study case we consider cooking activities which may undergo large intra class variations and are characterized by subtle details, observed by diﬀerent view points. With a BoK procedure we quantize each video frame with respect to a dictionary of meaningful space-time primitives, then we derive time series that measure how the presence of diﬀerent primitives evolves over time. The preliminary experiments we report are very encouraging on the discriminative power of the representation, also speaking in favor of the tolerance to view point changes. Keywords: Spatio-temporal interest points · Motion primitives · Multi- view motion analysis · Multi-view action analysis · Shearlet transform 1 Introduction Understanding human motion and its regularities is a key research goal of Human-Machine Interaction, with a potential to unlock more reﬁned abilities – such as the anticipation of action goals – and thus the design of intelligent machines able to proﬁciently and eﬀectively collaborate with humans [1, 2]. In this ongoing work we are interested in investigating HMI functionalities, where a machine (e.g. a robot) observes a human performing tasks and learns how to discriminate among the ones characterized by diﬀerent dynamic properties [3]. We consider upper body human action primitives taking place in a speciﬁc setting, cooking in our case. For the time being, we restrict our attention to the actor, and do not exploit any contextual information which could be derived, for instance, by the presence of a tool or an object. Since some time we have assisted to a growing interest towards the so-called space-time key-points. From the pioneering work of Laptev [4], who proposed an extension to the space-time of corner points, soon followed by alternative and possibly richer approaches [5, 6], we have appreciated the power of these key- points as low level building blocks for motion analysis and action recognition. c  Springer International Publishing AG 2017 S. Battiato et al. (Eds.): ICIAP 2017, Part I, LNCS 10484, pp. 40–50, 2017. https://doi.org/10.1007/978-3-319-68560-1_4