Fast gesture recognition based on a two-level representation q J.P. Bandera * , R. Marﬁl, A. Bandera, J.A. Rodríguez, L. Molina-Tanco, F. Sandoval Grupo ISIS, Dpto. Tecnología Electrónica, E.T.S.I. Telecomunicación, Universidad de Málaga, Campus de Teatinos s/n, 29071 Málaga, Spain article info Article history: Received 23 December 2007 Received in revised form 8 April 2009 Available online 6 June 2009 Communicated by H.H.S. Ip Keywords: Gesture recognition Adaptive curvature 3D trajectory matching Dynamic time warping abstract Towards developing an interface for human–robot interaction, this paper proposes a two-level approach to recognise gestures which are composed of trajectories followed by different body parts. In a ﬁrst level, individual trajectories are described by a set of key-points. These points are chosen as the corners of the curvature function associated to the trajectory, which will be estimated using and adaptive, non-iterative scheme. This adaptive representation allows removing noise while preserving detail in curvature at dif- ferent scales. In a second level, gestures are characterised through global properties of the trajectories that compose them. Gesture recognition is performed using a conﬁdence value that integrates both levels. Experimental results show that the performance of the proposed method is high in terms of computational cost and memory consumption, and gesture recognition ability. Ó 2009 Elsevier B.V. All rights reserved. 1. Introduction The emerging ﬁeld of human–robot interaction (HRI) represents an interdisciplinary effort that addresses the need to integrate so- cial informatics, human factors, cognitive science and usability concepts into the design and development of robotic technology. The aim is the development of social robots. These robots are ex- pected to work in human environments and to assist people in everyday tasks. They should also adapt not only to real, dynamic environments, but also to different social interactions. In addition to facial expressions, non-verbal communication is often conveyed through gestures and body movement. In some tasks, it is possible to describe gestures using only the paths followed by signiﬁcant body parts, like hands (Calinon and Billard, 2004). The movements of different body parts can be described by their associated 3D trajectories in Cartesian coordinates, which could be captured by the vision system of the robot. Thus, this description has been successfully used into a learning system which allows the robot to recognize and learn dual-hand gestures (Bandera et al., 2006). However, increasing the sampling ratio, the gesture length or the number of tracked body parts can lead to excessively large descriptors. This problem can be solved by select- ing a set of signatures to describe the trajectories associated to dif- ferent body parts. Previous works have addressed this problem of trajectory representation using global trajectory signatures, which are deﬁned in relation to an external reference (Croitoru et al., 2005), or using local trajectory signatures, which are based on dif- ferential measures (Rodriguez et al., 2004). The main advantage of the global signatures is their robustness to outliers and noise. On the other hand, they face major difﬁculties in capturing ﬁne de- tails of trajectories (Alajlan et al., 2007). Local signatures are supe- rior in discriminating ﬁne details, but they are usually highly sensitive to outliers and noise. The gesture recognition system also needs to address the prob- lem of comparing the perceived gesture with a set of memorized ones. This matching stage must take into account the unique char- acteristics of 3D trajectory data, such as different sampling rates, outliers, or different sequence lengths (Croitoru et al., 2005). Cur- rently, hidden Markov models (HMMs) can be considered as the state-of-art modelling scheme used in gesture recognition. They provide a robust and accurate framework which have been em- ployed in previous related works (Calinon and Billard, 2004; Asfour et al., 2006). However, there are also several shortcomings which must be taken into account. Thus, the number and precision of ges- tures which can be modelled by HMMs is bounded by the time complexity of training and inference algorithms. Other approaches which are commonly used to match 3D trajectories are the simpler dynamic programming alignment ones (Croitoru et al., 2005; Chen et al., 2005). These methods solve the problem by comparing the unmatched sequence of observations with a known sequence or training sample. This is useful for individual trajectory matching. However, in gesture matching, the problem remains of how to simultaneously warp the different trajectories of the body parts that compose the perceived gesture. 0167-8655/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2009.05.017 q This work has been partially granted by the Spanish Government and FEDER funds Project No. TIN2005-01359 and by the Junta de Andalucía Project No. TIC2007-2123. * Corresponding author. Tel.: +34 952 13 28 45; fax: +34 952 13 14 47. E-mail addresses: jpbandera@uma.es, jpbandera@gmail.com (J.P. Bandera). Pattern Recognition Letters 30 (2009) 1181–1189 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec