DIALOG-CONTEXT DEPENDENT LANGUAGE MODELING COMBINING N-GRAMS AND STOCHASTIC CONTEXT-FREE GRAMMARS Kadri Hacioglu and Wayne Ward Center for Spoken Language Research University of Colorado at Boulder E-mail: hacioglu,whw @cslr.colorado.edu ABSTRACT In this paper, we present our research on dialog dependent language modeling. In accordance with a speech (or sentence) production model in a discourse we split language modeling into two components; namely, dialog dependent concept modeling and syntactic modeling. The concept model is conditioned on the last question prompted by the dialog system and it is structured using -grams. The syntactic model , which consists of a collection of stochastic context-free grammars one for each concept, describes word sequences that may be used to express the concepts. The re- sulting LM is evaluated by rescoring -best lists. We report signif- icant perplexity improvement with moderate word error rate drop within the context of CU Communicator System; a dialog system for making travel plans by accessing information about flights, ho- tels and car rentals. 1. INTRODUCTION Statistical modeling of spoken language structure is crucial for the speech recognition and speech understanding components of di- alog systems. Two broad statistical language models (LMs) that have been extensively studied are -grams [1] and stochastic con- text free grammars (SCFGs) [2]. The standard -gram LM tries to capture the structure of a spoken language by assigning probabilities to words conditioned on preceding words. The value of is usually kept low (2 or 3) since (a) the number of parameters increases exponentially with and (b) the training data is sparse, particularly, in early phases of sytem development. Therefore, standard -gram LMs do not model longer distance correlations. They also do not take advan- tage of linguistic knowledge or structure. A SCFG consists of a number of non-terminals, terminals, production rules and rule probabilites. It defines a stochastic formal language. It is possible to define SCFGs at two levels; namely, sentence level and phrase level. Sentence level SCFGs provide complete syntactic analysis across a sentence considering all words. They are expected to work very well for grammati- cal sentences (those covered by the grammar) but completely fail in sentences with ungrammatical construction. So, their use for spoken language applications is very limited. On the other hand, phrase level SCFGs focus on the syntax of sentence fragments. The work is supported by DARPA through SPAWAR under grant #N66001-00-2-8906. They allow partial parsing of sentences and are more appropriate for spoken language modeling. SCFGs have properties complementary to -grams. They are combined in various ways to obtain LMs with better perplexity and speech recognition/understanding performance [3, 4, 5, 6]. A promising approach is the use of semantically motivated phrase level SCFGs to parse a sentence into a sequence of concept (or semantic) tokens which are modeled using -grams. In this paper, we consider the language modeling problem within the framework of concept decoding (an integrated approach to speech recognition and understanding) based on the speech pro- duction model in a typical dialog. This framework uses a dia- log context dependent LM with two components that we describe in Section 2. The idea of using dialog contextual knowledge to improve speech recognition and speeh undersatnding is not new [7]. Dialog dependent LMs have been recently investigated in [8, 9, 10, 11]. The method presented here is an extension of the work in [4], which was developed from ideas presented in [12], to a dialog dependent language modeling. We use the resulting LM to rescore N-best lists from our dialog system known as CU Com- municator [13]. The rescoring scheme is a crude approximation to the integrated approach. We report significant perplexity improve- ment along with moderate improvement in word error rate after the -best list rescoring. The paper is organized as follows. Section 2 presents the in- tegrated approach as a motivation to the use of dialog dependent language modeling. In Section 3, we explain an N-best rescoring scheme as a first order approximation to the integrated approach. Section 4 explains syntactic and semantic models in detail. Exper- imental results are presented in Section 5. Concluding remarks are made in the last section. 2. INTEGRATED APPROACH The speech production model that we base our approach on is de- picted in Figure 1. It is a slightly modifed version of the model in [14]. The user is assumed to have a specific goal that does not change throughout the dialog. According to the goal and the di- alog context the user first picks a set of concepts with respective values and then use phrase generators associated with concepts to generate the word sequence. The word sequence is next mapped into a sequence of phones and converted into a speech signal by the user’s vocal apparatus which we finally observe as a sequence of acoustic feature vectors.