T-SA-00576-2004.R1 1 c Abstract— The design of Spoken Dialog Systems cannot be considered as the simple combination of speech processing tech- nologies. Indeed, speech-based interface design has been an ex- pert job for a long time. It necessitates good skills in speech tech- nologies and low-level programming. Moreover, rapid develop- ment and reusability of previously designed systems remains un- easy. This makes optimality and objective evaluation of design very difficult. The design process is therefore a cyclic process composed of prototype releases, user satisfaction surveys, bug reports and refinements. It is well known that human intervention for testing is time-consuming and above all very expensive. This is one of the reasons for the recent interest in dialog simulation for evaluation as well as for design automation and optimization. In this paper we expose a probabilistic framework for a realis- tic simulation of spoken dialogs in which the major components of a dialog system are modeled and parameterized thanks to inde- pendent data or expert knowledge. Especially, an Automatic Speech Recognition (ASR) system model and a User Model (UM) have been developed. The ASR model, based on articulatory simi- larities in language models, provides task-adaptive performance prediction and Confidence Level (CL) distribution estimation. The user model relies on the Bayesian Networks (BN) paradigm and is used both for user behavior modeling and Natural Lan- guage Understanding (NLU) modeling. The complete simulation framework has been used to train a reinforcement-learning agent on two different tasks. These experiments helped to point out several potentially problematic dialog scenarios. Index Terms—Spoken dialog systems, speech processing, dialog simulation, dialog evaluation, dialog strategy learning. I. INTRODUCTION IALOG simulation became the subject of recent re- searches for lots of reasons. Initially, systems developed for simulating dialogs aimed at validating dialog or discourse models. They were essentially systems involving two artificial agents implementing a formal description of the model and Manuscript received June 30, 2004 and revised December 1, 2004. This work was partly supported by the ‘Direction Générale des Technologies, de la Recherche et de l’Énergie’ (DGTRE) of the Walloon Region (Belgium, First Europe Convention n° 991/4351) as well as by the SIMILAR European Net- work of Excellence. O. Pietquin is now with the Signal Processing Systems group at the Metz campus of the École Supérieure d’Électricité (Supélec), F-57070 Metz, France (e-mail: olivier.pietquin@ieee.org). This work was realized when he was with the Signal Processing department (TCTS Lab.) of the Faculty of Engineering, Mons, B-7000 Mons, Belgium T. Dutoit is with the Signal Processing department (TCTS Lab) of the Faculty of Engineering, Mons, B-7000 Mons, Belgium (e-mail: thierry.dutoit@fpms.ac.be) communicating with each other. Their simulated exchanges were logged and compared with human-human dialogs [1]. During the Spoken Dialog System (SDS) design process, a series of prototypes is typically released and enhancements from one system to the next are made by collecting a corpus of dialogs between users and prototypes to evaluate their per- formance. One way to avoid prototype releases is to use Wiz- ard-of-Oz (WOZ) techniques, but this still requires human users intervention. Thus, another application of dialog simula- tion by computer means is the evaluation of SDSs. A first way to do is to build handcrafted systems able to interact with the SDS by some means and to evaluate the quality of the resulting dialog sessions by using objective metrics [2]. While this tech- nique avoids human involvement, it is driven by a set of fixed parameters and performs always the same in a given situation unless parameters are modified. So most powerful simulation systems for evaluation purpose are probabilistic. Data set ex- pansion is also an application. Indeed, if user evaluations of one or more prototypes have already been realized, a data cor- pus is collected and is hopefully representative of possible interactions between users and prototypes. This corpus can be used to create a generative model of the SDS environment [3]. The resulting model is then theoretically able to produce new dialogs having the same statistical properties than those in the corpus and enables to evaluate unseen dialogs. Finally, a last application is the simulation for optimal strat- egy learning purposes. Indeed, once a way to simulate many dialogs is available as well as a way to evaluate them, it is natural to think about exploiting this framework to learn opti- mal strategies from experience [4] [5]. The evaluation method is then used to build an optimality criterion. As the learning process requires lots of dialogs to converge toward an optimal strategy, computer-based simulation is really valuable. Indeed, real interactions between a learning agent and human users would be unfeasible since a large number of dialogs are needed; what is more, some of these dialogs can be long and sound unnatural, and the overall operation would be very ex- pensive and time consuming. In this paper we present a probabilistic framework for simu- lating dialogs based on parametric models of speech and lan- guage processing modules as well as on stochastic user model- ing. This framework has been developed in the aim of evaluat- ing dialog systems and learning optimal strategies but nothing prevents from using it for dialog model validation. It is based on a probabilistic description of a man-machine dialog, which A Probabilistic Framework for Dialog Simula- tion and Optimal Strategy Learning Olivier Pietquin, Member, IEEE, Thierry Dutoit, Member, IEEE D