SPOKEN LANGUAGE UNDERSTANDING STRATEGIES ON THE FRANCE TELECOM 3000 VOICE AGENCY CORPUS eraldine Damnati 1 , Fr´ ed´ eric B´ echet 2 , Renato De Mori 2 1 France T´ el´ ecom R&D - TECH/SSTP/RVA 2 av. Pierre Marzin 22307 Lannion Cedex 07, France 2 LIA - University of Avignon, BP1228 84911 Avignon cedex 09 France {frederic.bechet,renato.demori}@univ-avignon.fr geraldine.damnati@orange-ft.com ABSTRACT Telephone services are now deployed that allow users to react to tele- phone prompts in spoken natural language. These systems have lim- ited domain semantics and dialogue strategies which are represented by finite state diagrams. Most of these systems adopt a sequential approach where the Automatic Speech Recognition (ASR) process, the Spoken Language Understanding (SLU) process and the Dia- logue Management (DM) are separate processes. In the framework of the France Telecom 3000 voice service, we propose in this paper to study several strategies in order to integrate more closely these three processes: ASR, SLU, and DM. By means of a Finite State Machine paradigm encoding the different models used by these three levels we show how the search for the best sequence of dialogue states can be done simultaneously at the word, concept, interpreta- tion and dialogue state levels. Index TermsAutomatic Speech Recognition, Spoken Lan- guage Understanding, Language Models, Spoken Dialogue Systems 1. INTRODUCTION Telephone services are now deployed that allow users to react to tele- phone prompts in spoken natural language. These systems have lim- ited domain semantics and dialogue strategies which are represented by finite state diagrams. State transitions have associated a semantic knowledge represented by logical expressions of conceptual entities. Such a representation is manually derived and is adequate because it has been compiled by experts with a deep knowledge of the do- main. Real problems in these systems depend on automatic speech recognition (ASR) errors and on the difficulty in modeling relations between concepts and the way people express them. In order to take these sources of imprecision into account, it is proposed in this pa- per to conceive a dialogue strategy that considers the possibility that the dialogue is not in a single state at a given phase of its evolution. Rather, dialogue can be in different states and a language genera- tor component generates a prompt to the user that attempts to gather useful information not only for the progress of the dialogue towards a final state but also to reduce the entropy of the information about the actual dialogue state. In this framework, state transitions are labeled by the fact that the results of ASR causes certain premises to be true and an inference process leads to the truth of derived assertions. As the inference pro- cess is guided by inference rules, Finite State Machines (FSM) are derived from them and plugged into the dialogue Stochastic Finite collaboration with France Telecom R&D - contract N 021B178 State Machine (SFSM) in such a way that probabilities of dialogue states can be obtained from word lattice probabilities using opera- tions on automata. When dealing with real users corpora, one has to be able to han- dle Out-Of-Domain (OOD) utterances. Users that are familiar with a service are likely to be efficient and to strictly answer the system’s prompts. New users can have more diverse reactions and typically make more comments about the system. We propose in this paper to detect such OOD utterances in a first step, before entering into the Spoken Language Understanding (SLU) module. Indeed standard Language Models (LMs) applied to OOD utterances are likely to produce very noisy word lattices from which it might not be relevant to apply SLU modules. Furthermore, when designing a general interaction model such as the transition state model proposed in this paper, OOD utterances are as harmful for state prediction as can be an out-of-vocabulary word for the prediction of the next word with an n-gram LM. This is why we propose a new LM that integrates two sub-LMs: one LM for transcribing in-domain phrases, and one LM for detecting and deleting OOD phrases. Finally the different SLU strategies proposed in this paper are applied only to the portions of signal labeled as in- domain utterances. The paper is organized as follows. In Section 2 the vocal ser- vice on which this study has been made is described. Sections 3 and 4 outlines the ASR and SLU decoding processes leading to the computation of state probabilities. In Section 5 details of the inter- pretation knowledge are provided and experimental results are given in Section 6. 2. DESCRIPTION OF THE FRANCE TELECOM 3000 VOICE AGENCY CORPUS The 3000 service, the first deployed vocal service at France Telecom exploiting natural language technologies, has been made available to the public in October 2005. 3000 is France Telecom’s voice agency that enables customers to obtain information and purchase almost 30 different services, check their consumption, pay their bills and access the management of their services such as call forwarding or voice messaging. The continuous speech recognition system relies on a bigram language model. The interpretation is achieved through the Verbateam two-steps semantic analyzer. Verbateam includes a set of rules to convert the sequence of words hypothesized by the speech recognition engine into a sequence of concepts and an in- ference process that outputs an interpretation label from a sequence of concepts. Given the main functionalities of the application, two types of dialogues can be distinguished. Some users dial 3000 to ac-