NATURAL LANGUAGE UNDERSTANDING IN THE DACST-AST DIALOGUE SYSTEM T.R. Niesler and J.C. Roux trn@dsp.sun.ac.za jcr@maties.sun.ac.za Department of Electrical Engineering Research Unit for Experimental Phonology University of Stellenbosch Private Bag X1, Matieland, Stellenbosch 7602, South Africa ABSTRACT This paper describes the natural language understanding (NLU) component of the DACST-AST spoken dialogue system. We adopt a finite-state architecture and develop a syntax that allows these finite-state networks to be defined in a modular fashion. Meaning is associated with a particular path through a network by embedding semantic tags at appropriate points in its defini- tion. The understanding process consists of a parsing operation that determines whether a user utterance is contained within a given finite-state network. The semantic tags associated with the path resulting from a successful parse represent the information that has been “understood” from the user utterance. 1. INTRODUCTION The DACST-AST project is supported by the South African Department for Arts, Culture, Science and Technology (DACST) and has among its long-term aims the development of resources and expertise in the field of human language technology within the South African context. Human language technology has an important role to play in moving South Africa’s strongly multi- lingual and multi-ethnic society into the information age by im- proving the access and flow of information between individuals and private or public institutions. With 11 official languages and an abundance of accents and dialects, the successful develop- ment and deployment of speech technology presents many chal- lenges. Initial activity within the DACST-AST project has focused on the gathering, transcription and validation of speech resources for 5 of the 11 languages and on the design and development of speech and language processing algorithms to allow the realisa- tion of spoken dialogue systems [4]. This paper describes the development of the natural language understanding (NLU) com- ponent, which is a key subsystem of a dialogue system. 2. SPOKEN DIALOGUE SYSTEMS AND NATURAL LANGUAGE UNDERSTANDING A dialogue may be described as an interaction between two parties in which information is communicated between the par- ties in a number of sequential turns 1 . When the method of com- munication is speech, we refer to a spoken dialogue. A machine designed to maintain a spoken dialogue with a person is a spo- ken dialogue system. Since no other means of communication other than speech will be considered here, the term “dialogue system” will be taken to mean “spoken dialogue system”. Hence each turn of a spoken dialogue consists either of a system or a user utterance. 1 A turn refers to a single uninterrupted transfer of information from one party to the other. The task of natural language understanding within the con- text of a dialogue system is to extract meaning from a user ut- terance. On the basis of this meaning, the dialogue system must decide on how to proceed with the dialogue. A simplified dia- gram illustrating the architecture of the DACST-AST dialogue system is shown in figure 1. bla bla Language Recognition Speech RESPONSE SYSTEM RETRIEVE INFORMATION MEANING OF UTTERANCE RECOGNISED Speech UTTERANCE Synthesis Control Dialog Database Understanding Natural Figure 1: Architecture of the DACST-AST dialog system. To illustrate the operation of the system in figure 1, con- sider as a particular example a dialogue system with which the user can make a reservation at a certain hotel. With reference to figure 1, a typical dialogue might proceed as follows. First the user utters a sentence or phrase corresponding to his or her turn in the dialogue. This utterance is transcribed into text by the speech recognition component. Let us assume for now that the user’s utterance is: “Could I have a single room for tonight please” and that this has been correctly transcribed by the speech recog- niser. The recognised text is interpreted by the natural language understanding component, and its meaning passed on to the dia- logue controller. For our example, this meaning would be (i) that a booking has been requested, (ii) that it is for a single room and (iii) that it is for the same day. The understanding component must extract these three items of information from the recog- nised user utterance. Based on this inferred meaning, the dia- logue controller decides on an appropriate next action. This may be the retrieval of some requested information from a database, for example to check whether there are in fact any single rooms available that evening. Alternatively the system may request from the user some further information that it requires to per- form its task, for example it may ask the user whether he or she would like a room with sea view or one with mountain view. Once the system has decided on the appropriate next action, it responds to the user via the speech synthesis module. Due to the many ways in which a particular message may be communicated by human language, the user’s input may be expected to be highly variable. The NLU system must extract the important information from such natural language input and