An Interactive Directory Assistance Service for Spanish with Large-Vocabulary Recognition R. Córdoba, R. San-Segundo, J.M. Montero, J. Colás, J. Ferreiros, J. Macías-Guarasa, J.M. Pardo Grupo de Tecnología del Habla. Departamento de Ingeniería Electrónica. Universidad Politécnica de Madrid E.T.S.I. Telecomunicación. Ciudad Universitaria s/n, 28040 Madrid, Spain cordoba@die.upm.es http://www-gth.die.upm.es ABSTRACT In the EU funded IDAS project (LE4-8315), demonstrators providing an automated interactive telephone-based directory assistance service have been developed by ten partners from Germany, Greece, Spain and Switzerland [6]. In this paper we will focus in the Spanish demonstrator. In particular, we will describe the following aspects: The general architecture of the system, paying special attention to the speech recognition module. We will present new alternatives for the estimation of continuous HMMs and the agglomerative clustering of context-dependent units. The most common problems encountered in the development of this kind of systems and their operation in a real environment. Impressions, opinions and scores from real-world users of the system. Keywords: large vocabulary recognition, telephone- based, directory assistance service, dialog. 1. INTRODUCTION In the IDAS project, we address the challenging problem of automating the provision of directory assistance services to the public over the telephone network. The technical challenge that has to be tackled makes high demands on each of the speech processing components: • A speech recognizer for Large Vocabulary over the telephone. • A speech production system able to speak out any imaginable phone directory entry. • A dialogue component that can interpret user inputs and ask the right questions in order to guide the users quickly to the desired information. Directory assistance services are very interesting for telephone companies, because they save operator time and the information that has to be provided is very reduced (the desired telephone number). This aspect reduces user rejection, specially if the service is not free. It is important to achieve user satisfaction from the beginning, if a system is operating in real life. However, speech technology is still far from being perfect. One of the project's main focuses was therefore to provide the user with a high success rate, independently of the current state of technology. To this end, the system design incorporates an operator fallback component. The main difficulty in a system like this is the noise and the reduced signal to noise ratio that is common in a telephone line. The second difficulty is the high degree of confusability that arises when you consider 10,000 surnames in Spanish, because we need their exact transcription. As a perfect recognition system is not feasible, we have to introduce new alternatives in the dialog: user confirmation and spelling. We have developed a spelling module that is very robust and helps to disambiguate a great number of entries. In this paper, we will describe a series of improvements that have been applied to a large vocabulary isolated-word recognition system using continuous models. We will cover improvements in the techniques for continuous HMMs and agglomerative clustering. 2. DESCRIPTION OF THE SYSTEM The demonstrator presented in the paper has the following characteristics: • Representative database with some 1 million registers. • 4 different vocabularies: cities, first-names, surnames and company names. All of them, using 10,000 words. We have obtained them from the results of project Onomastica (except for the company names). • Provides telephone numbers for private users and companies. • System-driven dialogue optimized to increase the transaction success. • With operator fallback if the recognition fails. • We have recorded the most common system prompts to improve the general acceptance of the system voice answer. • All the messages used for confirmation were generated using our Spanish text-to-speech system [1]. Special attention has been dedicated to names pronunciation. When the recognition module fails, and before falling back to the operator, the user is asked to spell the misrecognized data, as an intermediate step. If the spelling fails too, the operator receives a dialog box where all information about the call is present: recognition results, unrecognized entries and different icons that play what the user has said in each step of the dialog. This way, the operator is able to complete the missing entries, allowing the system to make the query to the database with the correct information. Everything is solved in transparently, without any intervention from the user, and there is a warranty that all calls can be handled.