Semantic Information Processing Of Spoken Language A.L. Gorin, J.H. Wright, G. Riccardi, A. Abella and T. Alonso AT&T Labs, Speech Research 180 Park Avenue Florham Park, N.J. 07932 {algor, jwright, dsp3, abella, tma}@research.att.com ABSTRACT The next generation of voice-based user interface technology will enable easy-to-use automation of new and existing communication services. A critical issue is to move away from highly-structured menus to a more natural human-machine paradigm. In recent years, we have developed algorithms which learn to extract meaning from fluent speech via automatic acquisition and exploitation of salient words, phrases and grammar fragments from a corpus. These methods have been previously applied to the ’How may I help you?’ task for automated operator services, in English, Spanish and Japanese. In this paper, we report on a new application of these language acquisition methods to a more complex customer care task. We report on empirical comparisons which quantify the increased linguistic and semantic complexity over the previous domain. Experimental results on call-type classification will be reported for this new corpus of 10K utterances from live customer traffic. 1. INTRODUCTION The next generation of voice-based user interface technology will enable easy-to-use automation of new and existing communication services. A critical issue is to move towards a more natural human-machine paradigm. By natural, we mean that the machine understands what people actually say, in contrast to what a system designer would like them to say. This approach is in contrast with menu-driven or strongly-prompted systems, where many users are unable or unwilling to navigate such highly structured interactions. This research targets shifting the burden from human to machine, wherein the system adapts to peoples’ language, as contrasted with forcing users to learn the machine’s jargon. In particular, we have developed algorithms which learn to automatically extract meaning from fluent speech. A key intuition is that some linguistic events are crucial to recognize and understand for a task, others not so. We’ve quantified this idea via salience, which measures the information content of an event for a task [Go95]. Algorithms have been developed which automatically acquire and exploit salient words, phrases and grammar fragments from a corpus [Go97][Wr97][Ar99]. These methods have been previously applied to the ’How may I help you?’ task for automated operator services, in English [Go97], Spanish and Japanese [Ba00]. The early experiments were based on excerpts from human/human interactions drawn from live customer traffic. In later experiments, spoken language understanding (SLU) was then embedded in a dialog system [Ab97][Ab99] and experimentally evaluated [Ri00] on 20K human/machine transactions, again drawn from live customer traffic. The primary focus of SLU in these experiments has been call-type classification, i.e. determining which service type a customer is requesting. Other researchers have reported on analogous experiments in other domains [Ca98][Ed99]. In the operator services domain, we’ve also developed methods for extracting auxiliary information such as phone and credit-card numbers embedded in natural spoken language [Ra99]. In the operator services domain, the task involves placing telephone calls, specifying billing methods for those calls (e.g. collect, card, etc.), and requesting information about making those calls (e.g. rate, area codes, etc). In this paper, we report on a new application of our language acquisition methods to a more complex customer care task. In this task, users are asking questions about their bill, calling-plans, etc. This is intuitively a more complex domain. In this paper, we report on empirical comparisons which quantify the increased linguistic and semantic complexity of this new task over the previous domain. Experimental results will be reported and compared for a new corpus of 10K human/human dialogs recorded from live customer traffic. In Section 2, we describe the new database and how it was collected. Section 3 discusses the semantic complexity of this customer care task and compares it to the operator services domain. In Section 4, we do the same for linguistic complexity. An initial experimental evaluation of call-classification from speech for this customer care task is reported in Section 5, demonstrating portability and scalability of our language acquisition methods.