Euh as cue for speaker conﬁdence and word searching in human spoken answers in French Anne Garcia-Fernandez 1,2 , Ioana Vasilescu 2 , Sophie Rosset 2 1 University Paris-Sud, Orsay, F-91405 2 LIMSI-CNRS, Orsay, F-91403 annegf, vasilescu, rosset @limsi.fr Abstract This paper deals with the contextual analysis of the vocalic hes- itation euh in French in a corpus of human elicited answers. Through the analysis of the contextual combinatorial patterns, the new information introductory role of this vocalic hesitation is investigated. Observations supports trends noticed in other languages and suggest potential optimization for question an- swering automatic systems. Index Terms: vocalic hesitation, feeling of knowing, rephras- ing, interaction management, QA systems. 1. Introduction Spoken disﬂuencies have been widely investigated in computa- tional linguistics. Approaches focused on the automatic clas- siﬁcation and identiﬁcation of the various disﬂuent phenomena, their acoustic/prosodic cross-language patterns or the impact on the automatic performance in various spoken language process- ing frameworks. The initial position adopted in processing such spoken events consisted in an in-depth description to efﬁciently clean the speech, that is the lexical level of the oral message [1]. However, more recently it has been argued that most spoken dis- ﬂuencies are not problems in speaking but solutions to problems in speaking [2]. The current study is concerned with automatic ques- tion/answering (QA) dialog systems: in this framework mak- ing an effective usage of such disﬂuent devices is an actual ob- jective which aims to improve and make more natural the in- teraction. In particular, detecting and efﬁciently exploiting the various disﬂuent-like items produced by the human interlocutor and delivering answers with appropriate corresponding items is still a going concern. The stress is put here on the automatic answer modeling with the purpose of (i) providing more accu- rate answers in terms of grammatical structure and information content proper, but also of (ii) systems ability to evaluate their conﬁdence in the answer. The latter is highly concerned with the disﬂuency level: psycholinguistic studies on both the speakers and the listeners feelings about the answer delivered/received within a spoken interaction have shown that paralinguistic fea- tures of utterances such as pauses, intonation, and interjections play a compelling role [3, 4]. 1 In line with these ﬁndings, com- putational studies focused on the speakers feelings monitoring 1 The meta-cognitive state of the speaker displayed in an answer has been associated with the feeling of knowing (FOK), that is speakers’ ability to accurately monitor their knowledge about the information de- livered. Listeners in turn are able to infer about such states by making smart usage of paralinguistic cues (i.e. feeling of another’s knowing - FOAK) [4]. (such as uncertainty) with the aim of improving system perfor- mance [5, 6]. We focus here on the distribution of combinatorial patterns of the vocalic hesitation euh in French in a corpus of human elicited answers. Our working hypothesis is that euh may point as well on the answer structure by signaling the new provided information The longer term objective is to model such items in a natural language generation answers framework. 2. Data description and annotation methodology 2.1. The MACAQ corpus MACAQ corpus (Multi-Annotated Corpus of Answers to Questions) [7] is composed of answers provided by humans. They have been elicited through a number of questions similar to the ones usually addressed to an automatic system operating in open domain. Questions with various controlled linguistic forms have been manually generated beforehand to illustrate a potential range of interrogative structures a human may address to an automatic system. Then questions have been addressed to humans who provided answers. 2 MACAQ has been built with the aim of portraying typical answers to serve as model for fur- ther automatic answers generation in natural language. Duration 1h10 # Answers 1,044 # Lexical items 6,472 # Lexical types 657 # Utter. w. euh 244 (23.4%) Table 1: General description of the MACAQ corpus. The table above details the content of the corpus: more than 23% of the utterances contains at least a vocalic hesitation, that is, according to [4], about one in four speakers display some doubt or at least need of longer time to formulate the answer. 2.2. Annotation methodology The adopted manual annotation strategy is described in [7]. An annotation of the elements of the question reused in the an- swer was done. Those elements annotated with the tag QUE correspond to an old information. The information-answering the question (as deﬁned in [7]) which correspond to a new in- formation was annotated using the tag ANS. A difference has been made between the expected answer and some additional 2 Both spoken and written modality have been elicited but we con- sider only the speech corpus in this paper.