Computational Analysis of Mannerism Gestures Kanav Kahol*, Priyamvada Tripathi, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, Department for Computer Science and Engineering Fulton College of Engineering and Sciences Arizona State University, Tempe, Arizona 85281, USA kanav@asu.edu Abstract Humans perform various gestures in everyday life. While some of these gestures are typically well understood amongst a community (such as “hello” and “goodbye”), many gestures and movement are typical of an individual’s style, body language or mannerisms. Examples of such gestures include the manner is which a person laughs, hand gestures used to converse or the manner in which a person performs a dance sequence. Individuals possess a large vocabulary of mannerism gestures. Conventional modeling of gestures as a series of poses for the purpose of automatically recognizing gestures is inadequate for modeling mannerism gestures. In this paper we propose a novel method to model mannerism gestures. Gestures are modeled as a sequence of events that take place within the segments and the joints of the human body. Each gesture is then represented in an event-driven coupled hidden markov model (HMM) as a sequence of events, occurring in the various segments and joints. The inherent advantage of using an event-driven coupled-HMM (instead of a pose- driven HMM) is that there is no need to add states to represent more complex gestures or increase the states for addition of another individual. When this model was tested on a library of 185 gestures, created by 7 subjects, the algorithm achieved an average recognition accuracy of 90.2%. 1. Introduction Many gestures are widely accepted in a society. Examples of these gestures include “hello” and “waving goodbye”. Typically such gestures are (1) few in numbers, (2) convey syntax and semantics of motion that are understood over a large population. There is however a larger class of gestures that are performed in daily life and are associated with a single individual. Such gestures are typically representative of an individual’s body language, his/her style and mannerisms. Example of such gestures include, the movements associated with the action of laughter, the manner in which an individual converses (stressing on a word by hand gestures) and complex motions such as dance sequences. Such gestures and movements have been referred to in literature as mannerisms, body language or stylistic gestures. This paper attempts to highlight an approach used to effectively model mannerism gestures for the purpose of automatic recognition. 2. Related Work Pose-Driven hidden markov models (HMMs) have been the most popular method for gesture segmentation and gesture recognition. These models represent gestures as a probabilistic sequence of static poses. Gestures are captured in video or 3D motion capture data and distinctive poses are empirically chosen to model gestures correctly. These poses vary from one gesture to another. As a pose corresponds to a state of the model, this severely limits the number of gestures that the model can recognize as humans can assume infinite poses. An HMM is then trained with multiple motion sequences computing the transition probabilities between these poses. Bobick [1] and Campbell [2] mapped Cartesian tracking data (captured from sensors on body joints) onto a body hierarchy for activity recognition. The trajectory data of the joints are represented in a high dimensional phase space, and points in this space (or one of its subspaces) are employed to recognize gestures. The goal of both researchers is to transform the continuous motion in 3D space into a set of discrete symbol representations, each of which (1) corresponds to a point in a high- dimensional phase space, and (2) can be used to detect the start of each gesture. This approach is analogous to pose- based modeling of gesture. A point in the phase space represents the feature vector of each frame in a motion sequence. Depending upon the region of space in which it resides, a point is assigned to a particular state. A typical gesture is thereby defined as an ordered sequence of these states, restricted by motion constraints. In Campbell’s work [2], the learning/training process fits a unique predictor curve (representing each gesture) into a subspace of the full phase space, using low-order polynomials. Gesture recognition is then based on the maximum correlation between the various predictor curves and the motion being analyzed. Bobick [1], on the Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04) 1051-4651/04 $ 20.00 IEEE