The African Speech Technology Project: An Assessment Roux, JC * , Louw, PH * & Niesler, TR ** * Research Unit for Experimental Phonology ** Electrical and Electronic Engineering Stellenbosch University Stellenbosch jcr@sun.ac.za, phlouw@sun.ac.za, trn@sun.ac.za Abstract This paper reflects on the recently completed African Speech Technology (AST) Project. The AST Project successfully developed eleven annotated telephone speech databases for five languages spoken in South Africa i.e. Xhosa, Southern Sotho, Zulu, English and Afrikaans. These databases were used to train and test speech recognition systems applied in a multilingual telephone-based prototype hotel booking system. An overview is given of the database design and contents. The acquisition of the data is discussed with regards to the telephony interface, as well as speaker recruitment and briefing. Particular reference is given to some of the practical implications of acquiring appropriate data in under-developed communities. Database management processes such as transcription, quality control and validation are explained. This is followed by information on the development of the prototype. Results of usability tests are discussed followed by an assessment of the Project as a whole. Introduction The research project entitled Promoting the development of the official languages of South Africa through language and speech technology applications (working title: African Speech Technology - AST) was a four year project funded by the Innovation Fund of the Department of Science and Technology (DST) of the South African national government. The Project commenced in January 2000 and was introduced to the LREC community in Athens in the same year (Roux et al., 2000). This Project was successfully completed in December 2003. The Project was motivated by an appreciation of the need to develop the indigenous languages of South Africa at technological level in order to keep these languages abreast with developments in the ICT field, and to facilitate access to information for all citizens in a developing country. As one of its main aims, AST developed telephone speech databases for five of South Africa's eleven official languages, namely Xhosa, Southern Sotho, Zulu, South African English and Afrikaans. These databases were fully transcribed both orthographically and phonetically. A second main aim of the Project was to develop a multilingual telephone-based hotel booking system as a prototype for demonstration purposes. This system allows speech to be both recognised and synthesised in one of three languages. A user may use the system to negotiate a reservation at a hypothetical hotel in his or her preferred language. The large and varied speech databases produced as part of the Project were instrumental especially to the successful development of the speaker- independent speech recognition module, which forms part of the overall system. Database design and contents Language coverage The internal speech variation in spoken South African English and Afrikaans is considerable and, in many instances, culturally-bound. In order to make provision for these known varieties, a total of eleven databases based on the five languages was developed. The English and Afrikaans databases are divided into five and three sub-databases respectively, based on different speech varieties used by mother-tongue and non-mother- tongue speakers. For the English database, English mother-tongue speakers (database EE) as well as four groups of non-mother-tongue speakers were targeted, namely Black, Coloured, Asian and Afrikaans speakers (databases BE, CE, IE, AE). The Afrikaans database- group included speech produced by Afrikaans mother- tongue speakers, as well as Black and Coloured speakers (databases AA, BA, CA). Within the Black speaker group, speakers having any one of Xhosa, Zulu, Southern Sotho (Sesotho), Tswana (Setswana) or Northern Sotho (Sepedi) as their mother tongue were included. For the Xhosa, Zulu and Southern Sotho databases only mother- tongue speakers were recruited (databases XX, ZZ, SS). General description of contents The AST contents specification totals 38 to 40 utterances per phone call comprising a mixture of spontaneous and read speech. The types of read utterances elicited include isolated digit items, natural numbers, dates, times, money amounts, application/domain specific words or phrases, and phonetically rich words and sentences. Spontaneous responses were gathered by asking the speakers to say their age, home language, date of birth and to answer yes/no questions. The acquisition and management of speech databases Telephony interface An ISDN Primary Rate Interface (PRI) was required in order to digitally record the incoming calls on eleven channels simultaneously, as well as to log call information. A Dialogic D/300-SC board which interfaced directly to an ISDN PRI channel from Telkom, was used. When a speaker dialled the toll-free number, 93