Automatic Feature Selection for Predicting Content of User Utterances in Dialogs Svetlana Stoyanchev and Amanda Stent Department of Computer Science Stony Brook University Stony Brook, NY 11794-4400 svetlana.stoyanchev, amanda.stent @gmail.com 1. Introduction In task-oriented spoken dialog systems (SDS), the system often requests explicit conﬁrmation of user- provided task-relevant concepts. The user utter- ance following a conﬁrmation question (the post- conﬁrmation utterance) is important to successful dialog outcomes. It may be a simple conﬁrma- tion or rejection (e.g. yes, no, right, correct), or a correction or a topic change containing new con- cepts. For example, in a set of one month of calls to the deployed Let’s Go! bus route information SDS (Raux et al., 2005), 18% of post-conﬁrmation utterances contain a concept (time, place, or bus) and 20% of these contain a concept type diﬀer- ent from that in the system’s conﬁrmation prompt. The speech recognition word error rate (WER) on all post-conﬁrmation utterances in this set is 38%, while on post-conﬁrmation utterances with a con- cept it is 49%. Correct identiﬁcation of concept types in post-conﬁrmation utterances could lead to improved speech recognition and dialog outcomes. In this paper, we propose a concept-speciﬁc lan- guage model adaptation strategy and evaluate it on post-conﬁrmation utterances. We adopt a two-pass recognition approach (Young, 1994). In ﬁrst pass recognition, the input utterance is pro- cessed using a generic language model trained on post-conﬁrmation utterances. Recognition with a generic model frequently fails on concept words such as Oakland or 61C. We then use acoustic, lex- ical and dialog history features to determine the task-related concept type(s) likely to be present in the utterance. Finally, any utterance that is de- termined to contain a concept type is re-processed using a concept-speciﬁc language model. We show that: (1) it is possible to achieve high accuracy in determining presence or absence of particular con- cept types in a post-conﬁrmation utterance; and (2) concept-speciﬁc language model adaptation can lead to improved speech recognition performance for post-conﬁrmation utterances. In this paper we focus on alternative methods for selecting lexical features for concept type classiﬁcation in the pres- ence of ﬁrst-pass recognition errors. 2. Classiﬁcation Experiment In the Let’s Go! SDS, there are three concept types: time, place, and bus. We train a binary classiﬁer for each concept type using Weka’s J48 decision tree implementation (Witten and Frank, 2005). We use acoustic/prosodic (RAW) features (F0 max, energy, utterance duration, and diﬀer- ence between F0 max in ﬁrst and second halves of the utterance (Litman et al., 2006)), and dia- log history (DH) features (the present and previ- ous dialog states in this dialog, from the Let’s Go! system logs). We also use lexical (LEX) features from ﬁrst-pass recognition. (All these features are available at run-time in a live SDS.) Each concept type is associated with diﬀerent words and phrases, e.g. “leaving”, “arriving”, “to”, “from”, “downtown”, “61C”. Also, the ab- sence of a concept type is associated with particu- lar words, e.g. “yes” and “no”, which indicate sim- ple conﬁrmations and rejections. In this domain, concept-related noun phrases are frequently mis- recognized in ﬁrst-pass recognition, so we cannot rely on these for concept type classiﬁcation. In- stead, we have to select a reliable set of lexical fea- tures for concept type classiﬁcation that is robust to ﬁrst-pass recognition errors. We experimented with three diﬀerent sets of lexical features obtained using two diﬀerent methods: manual selection, and automatic selection using mutual information. Manual selection In our ﬁrst method, we manu- ally selected a small set of lexical features likely to be recognized with high accuracy, LEX 5 : “I” (in- dicates presence of any concept type), “yes” and “no” (indicates absence of any concept type), and “to” and “from” (indicates presence of place). Automatic selection In our second method, we used mutual information between words/phrases and concept types (Manning et al., 2008) to auto- matically select lexical features from a set of train- ing data. As our training data, we used one month of Let’s Go! calls from 2005. We extracted from the data all the transcribed user utterances. We re- moved all words that realize concepts (e.g. “61C”, “Squirrel Hill”), as these are likely to be misrecog- nized in ﬁrst-pass recognition. We then extracted as possible lexical features all word unigrams and