Combining User Intention and Error Modeling for Statistical Dialog Simulators Silvia Quarteroni 1 , Meritxell Gonz ´ alez 1,2 , Giuseppe Riccardi 1 , Sebastian Varges 1 1 DISI - University of Trento, 38050 Povo (Trento), Italy 2 TALP Center, Universitat Polit` ecnica de Catalunya, 08034 Barcelona, Spain silviaq@disi.unitn.it, mgonzalez@lsi.upc.edu, riccardi@disi.unitn.it, varges@disi.unitn.it Abstract Statistical user simulation is an efﬁcient and effective way to train and evaluate the performance of a (spoken) dialog system. In this paper, we design and evaluate a modular data-driven di- alog simulator where we decouple the “intentional” component of the User Simulator from the Error Simulator representing dif- ferent types of ASR/SLU noisy channel distortion. While the former is composed by a Dialog Act Model, a Concept Model and a User Model, the latter is centered around an Error Model. We test different Dialog Act Models and Error Models against a baseline dialog manager and compare results with real dialogs obtained using the same dialog manager. On the grounds of dia- log act, task and concept accuracy, our results show that 1) data- driven Dialog Act Models achieve good accuracy with respect to real user behavior and 2) data-driven Error Models make task completion times and rates closer to real data. 1. Introduction Data-driven techniques are a widely used approach to the de- velopment of robust (spoken) dialog systems, particularly when training statistical dialog managers (DMs) [1, 2]. Generating the data to train such DMs can be costly as potential users are not always available for the task at hand; moreover, once the data is available, it must be manually analyzed and anno- tated. This is why user simulators (US) have been introduced to replace real conversations with synthetic ones and optimize a number of SDS components. Indeed, several approaches exist to the design of user simulators, as illustrated in [1]: as we aim to train statistical DMs [2], we focus on the intention (rather than lexical) level of simulation, as formalized in [3]. In this paper, we: 1) design a simulator where the Error Model derives its parameter estimates from real conversations; 2) deﬁne and implement different simulation models, by vary- ing the Dialog Act Model and the Error Model components; 3) evaluate different simulators against real dialogs on the grounds of dialog act, task and concept accuracy. In particular, Section 2 presents our simulator architecture, Section 3 presents the sim- ulation environment where we conduct our experiments, illus- trated in Section 4. Finally, Section 5 positions our research in the context of related work and our conclusions and future work are summarized in Section 6. 2. Simulator Architecture Data-driven simulation takes place within the rule-based ver- sion of the ADASearch system [2], which uses up to 16 dia- log acts (described in [4]) to deal with three tasks and a dozen Work partly funded by EU project ADAMACH (contract 022593). !""#" %#&'( )*+(#, %+-+,'" ./'" %#&'( ./'" 0#+( a s )*+(#, 123 %#&'( 4#-2'53 %#&'( {a u0 ,…a uN } â u User Simulator Error Simulator Figure 1: Architecture of the simulation environment concepts related to lodging and events in Trentino, Italy. Since simulation in our framework occurs at intention level, the sim- ulator and DM exchange actions, i.e. ordered sequences of dialog acts and (optionally) concept-value pairs. As illus- trated in Figure 1, at turn t, the DM issues an action as, de- ﬁned as an ordered dialog act sequence as = {da0, .., dan}, where each dialog act carries zero or more concept-value pairs: daj = daj (c0(v0), .., cm(vm)). For instance, we could have as = {Apology(); Clarif-request(Event type(fair))}. A User Simulator and an Error Simulator are then involved: the former estimates a plausible user action ˆ au given the DM action as; the latter distorts ˆ au into an N -best list of simulated actions S = {a 0 u , .., a N u } received by the DM at t +1 in “re- placement” of the user-ASR-SLU pipeline. A conﬁdence score is associated with each simulated action a j u and each individual concept forming a brick of interpretation. In order to generate the list S of upcoming action hypothe- ses, the probability of each action being generated after the pre- vious DM action as is estimated based on the conversation con- text. Such a context is represented by a User Model, a Dialog Act Model, a Concept Model and an Error Model. In particular, • the User Model simulates the behavior of an individual user in terms of goals and other caller-speciﬁc features such as cooperativeness and tendency to hang up; • the Dialog Act Model generates a distribution of M ac- tions Au = {a 0 u , .., a M u }, each a i u being a plausible se- quence of dialog acts and concepts given as; one action ˆ au is chosen out of Au following speciﬁc criteria; • the Concept Model generates concept values for ˆ au by estimating P (ci (vi )|da(c0, .., cm)) ∀i ∈ [0..m]; • the Error Model simulates the noisy ASR-SLU chan- nel by “distorting” ˆ au with errors; it estimates Copyright  2010 ISCA 26 - 30 September 2010, Makuhari, Chiba, Japan INTERSPEECH 2010 3022