GESTURAL TRAJECTORY SYMMETRIES AND DISCOURSE SEGMENTATION Francis Quek, Yingen Xiong, and David McNeill Vision Interfaces & Sys. Lab. (VISLab) CSE Dept., Wright State U., OH The University of Chicago Correspondence: quek@cs.wright.edu Abstract Our approach is motivated by the conviction that gesture and speech are coexpressive of the underlying dynamic ideation that drives hu- man communication. As such, transitions and cohesions is gestu- ral behavior would inform us as to the discourse conceptualization. In this paper, we examine the role of motion symmetries of two- handed gestures in the structuring of speech. We employ a set of hand motion traces extracted from video and compute the correla- tion of these traces. The signs and magnitudes of the correlation co- efcients computed in the cardinal directions of the subjects torso (lateral and vertical in this work) characterize the symmetries. We employ a windowed computation approach that permits a balance between temporal resolution and robustness to noise. The resulting correlation proles are merged according to a temporal proximity rule. We apply this analysis to two conversational video sequences. A detailed analysis of the rst sequence reveals the persistence of gestural imagery between semantically-similar discourse pieces. A symmetry transition analysis is applied to the second dataset and compared against a manually generated discourse segmentation to demonstrate the potential of cross-modal discourse segmentation. 1. INTRODUCTION Human language is a dynamic interplay among our various com- municative channels that include speech, prosody, gesture, gaze, facial expression and body posture [1]. These modalities do not function independently, nor is any modality subservient to another (as when one inserts a gesture as an accompaniment to speech after the speech plan is in place). Instead they proceed co-equally from the same thought process that produces an utterance, and each car- ries aspects of the original thought [2]. As such, each mode bears the mark of the thought structure in some way. In gesture, this re- veals itself in the form of a Catchment [3]. The Catchment con- cept states that recurrence of ideation in discourse reveals itself in recurrence in gestural features. In this paper, we explore the relationship of hand symmetries (and their classication) to speech in discourse structuring. Con- cerning symmetry in sign language and gesture, Kita wrote: When two strokes by two hands coincide in sign language, the movements obey the well-known Symmetry Condition, which states that the movement trajectory, the hand orientation, the hand shape, and the hand-internal movement have to be either the same or symmetrical the Symmetry Condition also holds for gestures. [4, 5]. In fact, it appears that when both hands are engaged in gesticulation there This research has been supported by the U.S. National Science Founda- tion STIMULATE program, Grant No. IRI-9618887, Gesture, Speech, and Gaze in Discourse Segmentation and the NSF KDI program, Grant No. BCS-9980054, Cross-Modal Analysis of Signal and Sense: Multimedia Corpora and Tools for Gesture, Speech, and Gaze Research is almost always a motion symmetry (either lateral, vertical or near- far with respect to the torso), or one hand serves as a platform hand for the other moving hand. This tyranny of symmetry seems to lift for two moving hands during speech when one hand is performing a pragmatic task (e.g. driving while talking and gesturing with the other hand). Such pragmatic movement also include points of re- traction of one hand (to transition to a one-handed (1H) gesture), preparation of one hand (to join the other for a two-handed (2H) gesture or to change the symmetry type). One of the governing principles for the study of multimodal communication is the temporal cohesion across modes [6, 7, 8, 1]. In earlier work we performed manual observation of computed hand motion traces of a short discourse segment and showed segmen- tation of this discourse by handedness and kinds of symmetry [9, 10]. In this paper, we demonstrate automatic symmetry extraction on two discourse videos, and compare these against the speech to gauge its efcacy in discourse segmentation. We present a method to detect human hand symmetric gestures. First we track the subjects hands in video data. Second we compute the local correlation coefcients for the hand gesture signal to detect gesture symmetries. Finally, we analyze the relationship between symmetric gestures and speech in two sets of discourse videos. 2. SYMMETRY DETECTION Correlation of two signals can tell us the relationship between these two signals. With correlation coefcients of these two signals we can know that at a given moment the hand movements are in same direction or in opposite directions or no relationship. If and is left and right hand trajectories respectively we can compute correlation coefcient of the signals as: (1) where and are the mean values of and respectively, and denotes the frame number and denotes the positional value (if is the value of the hand position, we are computing lateral symmetry). Equation 1 yields the global property between left hand signal and right hand signal. To obtain local symmetry information, we employ a windowing approach: (2) where is the selected window, and denotes convolution. So we can get local symmetric property of the signal with suit- able window by following equation. (3) Seventh International Conference on Spoken Language Processing, Denver, CO, September, 16-20, 2002. pp. 185-188 Also as VISLab Report: VISLab-02-02 ICSLP2002, Fancis Quek, Yingen Xiong, and David McNeill 185