A DISCRIMINATIVE SEQUENCE MODEL FOR DIALOG STATE TRACKING USING USER GOAL CHANGE DETECTION Yi Ma, Eric Fosler-Lussier Ohio State University Department of Computer Science and Engineering Columbus, Ohio 43210 ABSTRACT Due to the dominating inﬂuence of Partially Observable Markov Decision Process (POMDP) framework used in spo- ken dialog systems, most previously proposed dialog state tracking methods favor generative models. However, in this work we adopt a discriminative approach to model the evolu- tion of the belief state within a spoken dialog system - more speciﬁcally, we use Conditional Random Fields (CRFs). Al- though we are not the ﬁrst to apply CRFs to dialog state tracking, the proposed approach considers the dialog state tracking task as a sequence tagging problem, in the hope of capturing the evolving user goals during a dialog. Equipped with an incremental decoding strategy as well as user goal change detection, our results show that both sequence model- ing and goal change information could bring advantage to the task. Index Terms— Dialog state tracking, spoken dialog sys- tem, Conditional Random Field, discriminative model, user goal change 1. INTRODUCTION An effective spoken dialog system must keep track of what the user wants – namely, the user goals or dialog state – at any point during a dialog. Since speech recognition is inevitably error-prone, it is crucial for a spoken dialog system to include a dialog state tracker in order to accurately estimate the true user goal so that the dialog manager can use the inferred user goal (often represented by joint slot values) to generate the optimal response to the user. However, due to the inherent uncertainty of recognizing human speech, it is difﬁcult for the machine to infer the true user goal when conﬂicting dialog state values are observed in the middle of a dialog – is the user trying to correct a misunderstanding of the system, did the system incorrectly hypothesize a correction, or did the user change her mind and is looking for alternatives? The process of requesting an alternative should affect the internal belief state of the system, and the resulting interaction with the user, differently than correcting an erroneous input; for example, with a correctly detected change of goal, the new hypothesis should have a much higher conﬁdence than the hypothesis that was replaced. Most state-of-the-art spoken dialog systems assume the user goal is ﬁxed during a dialog such that they do not have to deal with detecting misunderstandings versus goal change. However, the Dialog State Tracking Challenge [1] provided a method for beginning investigations into this phenomena. In [2], we conducted a pilot study in which we trained a Max- imum Entropy (MaxEnt) classiﬁer to detect a speciﬁc user di- alog act called reqalts 1 to approximate the detection of user goal change, since most mind changing dialog turns result from users’ exploring alternatives. With the trained MaxEnt user goal change detector, a binary decision – whether the user changes her mind or not – is made for each turn by the detector during testing. Then we can use the output of the goal change detector for downstream reasoning to infer di- alog state by injecting this observed prior knowledge into a discriminative probabilistic graphical model. In general, we are interested in improving the performance of dialog state tracking by explicitly predicting the change of a user goal. In this work, we explore injecting this goal-change de- tection feature into a dialog state tracker that models belief tracking as a sequence tagging task (with each turn represents a time slice). By modeling tracking in this manner, we can exploit uncertainty in previous turns through summing over the hypothesized dialog state paths expanding from the be- ginning up to current turn. In addition, keeping tracking of dialog state from previous turns is beneﬁcial to dialog man- ager in order to handle user responses to system conﬁrmations especially when the user postpones the answer to an implicit conﬁrmation by the system. 2. PREVIOUS WORK Since the Partially Observable Markov Decision Process (POMDP) [3] provides a uniﬁed statistical framework for 1 Short for request alternatives. A dialog turn is labeled with a user di- alog act reqalts when the user is requesting an alternative (different) value for a domain slot during the dialog. We found that simple language cues such as ‘how about’, ‘what if ’ and ‘anything else’ etc. are very accurate and indicative features for detecting reqalts user act.