Exploring the speech-gesture semantic continuum Farina Freigang and Stefan Kopp farina.freigang@uni-bielefeld.de, skopp@techfak.uni-bielefeld.de Faculty of Technology, Center of Excellence “Cognitive Interaction Technology” (CITEC) Collaborative Research Center “Alignment in Communication” (SFB 673) Bielefeld University, P.O. Box 100 131, D-33501 Bielefeld, Germany In natural conversation, speech and gesture are usually one unit that is either produced or received by a communication partner. However, the relationship between the meaning of speech and the meaning of gesture can diﬀer. Several terms have been used to specify these diﬀerent relationships, ranging from “redundant” over “supplemental” to “mismatching” information. No consensus about the exact deﬁnition of these terms or the appearing variety in how speech meaning and gesture meaning relate to each other has been reached. We argue that this confusion is due to the fact that these terms address diﬀerent dimensions of the speech-gesture semantic relationship, and therefore can hardly be related directly with each other. In the following, we discuss the terminology and related studies with regard to production and comprehension. On the side of language production, McNeill (1992) al- ready discussed semantic synchrony in general without going further into detail. Alibali and Goldin-Meadow (1993) were the ﬁrst to report“mismatches” produced by children learning the concept of mathematical equiv- alence. This term is not completely agreed on by Willems, ¨ Ozy¨ urek, and Hagoort (2007) who found that the term “mismatch” should be used with an “incon- gruent” speech-gesture pair and not when gesture con- veys “additional” but not contradicting information as speech. They referred to the mismatch phenomenon as speech-gesture “incongruence”. Furthermore, Kelly, ¨ Ozy¨ urek, and Maris (2010) accepted both terms “mis- match” and “incongruence”. In this context, other terms have been mentioned, e.g., speech-gesture “concor- dance”, “concurrent” speech-gesture pairs, “redundant” gestures, and “semantic coordination” of speech-gesture pairs. A detailed deﬁnition of these terms and a com- parison between them is as of yet still missing. On the side of language perception, McGurk and Mac- Donald (1976) showed that speech perception is not a purely auditory process but that mouth gestures can in- ﬂuence the recipient’s interpretation of what has been said by the message giver. Sometimes this interpre- tation results in a third meaning, diﬀerent from the speech or mouth gesture own their own. Similar to the McGurk-MacDonald eﬀect, one can assume that ob- served speech-gesture mismatches or incongruences may lead to a third interpretation by a subject. Habets, Kita, Shao, ¨ Ozyurek, and Hagoort (2011) looked at seman- tic congruent and incongruent combinations (“matches” and “mismatches”) or “semantic integration” of speech and gesture during comprehension in an EEG study and found that “mismatching gesture-speech combinations lead to a greater negativity on the N400 component in comparison with matching combinations” (p. 1852). This suggests a cognitive basis for what counts as mis- matching in terms of whether speech and gesture can be integrated. Figure 1: Two-dimensional space of semantic coordina- tion. From our point of view, the appearances and the un- derstanding of the speech-gesture semantic relationship has a lot more depth to it than sketched so far. In ﬁg- ure 1, we propose a two dimensional space that sepa- rates the semantic overlap from the semantic congru- ence/coherence of speech and gesture. A gesture can convey complementary (diﬀerent but necessary ), sup- plementary (additional ), or redundant (corresponding, matching ) information in relation to speech (and vice versa). While this level of semantic overlap has been studied throughout (box in ﬁgure 1), it implicitly as- sumes a high level of coherence between speech and ges- ture meaning (in the sense of being integrable into a coherent uniﬁed interpretation). This congruence, we argue, makes for a second dimension. If a gesture is pro- duced or received with neither semantic overlap nor con- gruence with speech meaning, we deﬁne this as a strong semantic mismatch, or hereafter just mismatch. A weaker mismatch is produced or received, if moderate overlap and intermediary congruence between speech and gesture meaning is given. With this in mind, we deﬁne a continuum of mis- matches between speech and gesture (dashed arrow in ﬁgure 1). In ﬁgure 2, three examples along the