Automatic Feature Selection for Predicting Content of User Utterances in Dialogs Svetlana Stoyanchev and Amanda Stent Department of Computer Science Stony Brook University Stony Brook, NY 11794-4400 svetlana.stoyanchev, amanda.stent @gmail.com 1. Introduction In task-oriented spoken dialog systems (SDS), the system often requests explicit confirmation of user- provided task-relevant concepts. The user utter- ance following a confirmation question (the post- confirmation utterance) is important to successful dialog outcomes. It may be a simple confirma- tion or rejection (e.g. yes, no, right, correct), or a correction or a topic change containing new con- cepts. For example, in a set of one month of calls to the deployed Let’s Go! bus route information SDS (Raux et al., 2005), 18% of post-confirmation utterances contain a concept (time, place, or bus) and 20% of these contain a concept type differ- ent from that in the system’s confirmation prompt. The speech recognition word error rate (WER) on all post-confirmation utterances in this set is 38%, while on post-confirmation utterances with a con- cept it is 49%. Correct identification of concept types in post-confirmation utterances could lead to improved speech recognition and dialog outcomes. In this paper, we propose a concept-specific lan- guage model adaptation strategy and evaluate it on post-confirmation utterances. We adopt a two-pass recognition approach (Young, 1994). In first pass recognition, the input utterance is pro- cessed using a generic language model trained on post-confirmation utterances. Recognition with a generic model frequently fails on concept words such as Oakland or 61C. We then use acoustic, lex- ical and dialog history features to determine the task-related concept type(s) likely to be present in the utterance. Finally, any utterance that is de- termined to contain a concept type is re-processed using a concept-specific language model. We show that: (1) it is possible to achieve high accuracy in determining presence or absence of particular con- cept types in a post-confirmation utterance; and (2) concept-specific language model adaptation can lead to improved speech recognition performance for post-confirmation utterances. In this paper we focus on alternative methods for selecting lexical features for concept type classification in the pres- ence of first-pass recognition errors. 2. Classification Experiment In the Let’s Go! SDS, there are three concept types: time, place, and bus. We train a binary classifier for each concept type using Weka’s J48 decision tree implementation (Witten and Frank, 2005). We use acoustic/prosodic (RAW) features (F0 max, energy, utterance duration, and differ- ence between F0 max in first and second halves of the utterance (Litman et al., 2006)), and dia- log history (DH) features (the present and previ- ous dialog states in this dialog, from the Let’s Go! system logs). We also use lexical (LEX) features from first-pass recognition. (All these features are available at run-time in a live SDS.) Each concept type is associated with different words and phrases, e.g. “leaving”, “arriving”, “to”, “from”, “downtown”, “61C”. Also, the ab- sence of a concept type is associated with particu- lar words, e.g. “yes” and “no”, which indicate sim- ple confirmations and rejections. In this domain, concept-related noun phrases are frequently mis- recognized in first-pass recognition, so we cannot rely on these for concept type classification. In- stead, we have to select a reliable set of lexical fea- tures for concept type classification that is robust to first-pass recognition errors. We experimented with three different sets of lexical features obtained using two different methods: manual selection, and automatic selection using mutual information. Manual selection In our first method, we manu- ally selected a small set of lexical features likely to be recognized with high accuracy, LEX 5 : “I” (in- dicates presence of any concept type), “yes” and “no” (indicates absence of any concept type), and “to” and “from” (indicates presence of place). Automatic selection In our second method, we used mutual information between words/phrases and concept types (Manning et al., 2008) to auto- matically select lexical features from a set of train- ing data. As our training data, we used one month of Let’s Go! calls from 2005. We extracted from the data all the transcribed user utterances. We re- moved all words that realize concepts (e.g. “61C”, “Squirrel Hill”), as these are likely to be misrecog- nized in first-pass recognition. We then extracted as possible lexical features all word unigrams and