2006 Special issue Perceiving the unusual: Temporal properties of hierarchical motor representations for action perception Yiannis Demiris * , Gavin Simmons Biologically Inspired Autonomous Robots Team (BioART), Intelligent Systems and Networks Group, Department of Electrical and Electronic Engineering, Imperial College London, South Kensington Campus, London SW7 2BT, UK Abstract Recent computational approaches to action imitation have advocated the use of hierarchical representations in the perception and imitation of demonstrated actions. Hierarchical representations present several advantages, with the main one being their ability to process information at multiple levels of detail. However, the nature of the hierarchies in these approaches has remained relatively unsophisticated, and their relation with biological evidence has not been investigated in detail, in particular with respect to the timing of movements. Following recent neuroscience work on the modulation of the premotor mirror neuron activity during the observation of unpredictable grasping movements, we present here an implementation of our HAMMER architecture using the minimum variance model for implementing reaching and grasping movements that have biologically plausible trajectories. Subsequently, we evaluate the performance of our model in matching the temporal dynamics of the modulation of cortical excitability during the passive observation of normal and unpredictable movements of human demonstrators. q 2006 Elsevier Ltd. All rights reserved. Keywords: Imitation; Minimum variance; Hierarchical structures; Action recognition; Corticospinal excitability 1. Introduction An increased interest in computational mechanisms that will allow robots to observe, imitate and learn from human actions has resulted in a number of computational architectures that allow the matching of demonstrated actions to the observer robot’s equivalent motor representations (Alissandrakis, Nehaniv, & Dautenhahn, 2002; Billard, 2000; Demiris & Hayes, 2002; Schaal, Ijspeert, & Billard, 2003). These architectures, whilst sharing common computational com- ponents such as modules for processing and classifying visual information and retrieving motor representations, differ in the way that the perceptual information is coded and classiﬁed, the organisation of the motor system, and the stage at which the motor representations are used. The ﬁnal aspect, at what stage the motor representations are used, differentiate architectures that follow the general ‘observe, classify, imitate’ decompo- sition (Kuniyoshi, Inaba, & Inoue, 1994), from those that advocate a stronger involvement of the motor systems in the perception process, through a ‘rehearse, predict, observe, reinforce’ decomposition (Demiris & Hayes, 2002; Demiris & Johnson, 2003; Schaal et al., 2003). In the latter, the observer robot invokes its motor systems to rehearse potential actions, predicting and conﬁrming incoming observed states during the demonstration. This approach has gained biological credibility with the discovery of the mirror system in monkeys and humans (Grezes, Armony, Rowe, & Passingham, 2003; Rizzolatti, Fadiga, Gallese, & Fogassi, 1996). Not all theoretical models advocate the actual rehearsal of candidate actions as our previous work has done (Demiris & Hayes, 2002), opting instead for a weaker version of this motor theory of perception, usually termed ‘motor resonance’, in which the motor representations are retrieved through a resonance mechanism rather than a generative mechanism. For imitation approaches that advocate the use of motor systems during the perception stage it becomes crucial to have a clear and ﬂexible motor system organisation. Hierarchical representations, involving primitive motor structures at the lowest level, while increasing their complexity in higher levels, have been proposed (Demiris & Johnson, 2003; Wolpert, Doya, & Kawato, 2003), and tested in robotic systems (Demiris & Johnson, 2003), which successfully learned and used sequences of actions by observation. However, little has been done with respect to the temporal dimension of these representations, including how they can be coordinated, as well as their relation to biological data. Neural Networks 19 (2006) 272–284 www.elsevier.com/locate/neunet 0893-6080/$ - see front matter q 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2006.02.005 * Corresponding author. E-mail address: y.demiris@imperial.ac.uk (Y. Demiris). URL: http://www.iis.ee.ic.ac.uk/yiannis (Y. Demiris).