Limited Domain USS Understandability Naturalness Pleasantness + Hybrid Speech Synthesis Advanced Afrikaans TTS Open Domain USS Flexibility Naturalness Pleasantness Abstract – Understandability, flexibility, naturalness and pleasantness are the requirements of an advanced TTS system. Limited domain unit selection synthesis is a well known and trusted concatenative synthesis technique used to reliably produce TTS systems with a high degree understandability, naturalness and pleasantness. The technique can however not produce the flexibility that is required and therefore it can not be used for the design of advanced TTS systems. This paper is therefore aimed at adding flexibility to systems built on the basis of this technique by finding a suitable back-up voice used to synthesize words that are not in the vocabulary of the limited domain. The resulting system is known as hybrid TTS system. A diphone concatenative and an open domain unit selection synthesis system were both implemented to accomplish this task. Results using subjective listening tests show that the open domain system will act as a more suitable back-up voice for the reason that the voice quality of this system is much greater than that of the diphone system. Combining the limited and open domain systems into a hybrid TTS system, results in a single system that meets all the requirements of an advanced TTS system for Afrikaans. Index terms: hybrid TTS, limited domain, open domain and diphone synthesis I. INTRODUCTION HE quality of a TTS system is determined by how well it measures up to the requirements of an advanced TTS system. The understandability of the system is the most important requirement since it shows how well a synthesized message is understood by a listener after the first time of listening [1]. The flexibility of the system is the second most important requirement for the reason that it shows the system’s ability to synthesize any possible linguistic entry. The naturalness and pleasantness of the system are both features that are a measure of how well the synthetic speech sounds compared to that of a human voice [1] [2]. Limited domain (Ldom) unit selection synthesis (USS) is a concatenative synthesis technique of the Festival speech synthesis system 1 that can produce TTS systems with a high degree of understandability, naturalness and pleasantness. These systems do however not hold the flexibility that is required since it has a predefined vocabulary [4] [5]. By using 1 A speech synthesis engine designed by the Centre for Speech Technology and Research (CSTR), University of Edinburgh [3] the hybrid approach described by [1] and [5], flexibility can be added to these systems. This is done by using a back-up voice to synthesize out of vocabulary words. The work in this paper builds on the work done by [1] where a diphone concatenative synthesis (DCS) Afrikaans TTS system is used in the hybrid approach to synthesize out of vocabulary words. DCS has the ability to produce very flexible TTS systems, but lacks in the other requirements [6]. In this work we experiment with the use of an open domain (Odom) unit selection system for the use of a new possible back-up voice to the limited domain system. Odom USS has the advantage that it can produce very flexible TTS systems with a high degree of understandability, naturalness and pleasantness, but has the disadvantage that it can be inconsistent [6]. Using the Odom Afrikaans system in a new hybrid approach (as shown in Figure 1.1) means that the resulting system would be more acceptable than the system built by [1] since the quality of Odom systems are usually greater than DCS systems [6]. Figure1.1: Hybrid approach to an advanced TTS system for Afrikaans The need for an Afrikaans TTS system comes with the growing interest in integrating modern technology into the eleven official languages of South Africa. Only 8% of the country speaks English as its home language according to [7]. The majority of modern technological systems in the country operate in English and therefore people either have phobias of using these systems because they are uncomfortable with the language or because they have limited computer literacy skills. The aim is therefore to eliminate these phobias by developing and implementing multilingual technological systems that all users can understand and relate to. The use of such systems would mean that people would now able to use technological systems simply by communicating with the systems in their mother tongues. An Afrikaans TTS system is a step forward in achieving this goal by acting as a benchmark for the design of the first complete multilingual TTS system for all South African languages. A Hybrid Text-To-Speech Approach for an Advanced Afrikaans System Francois Rousseau and Daniel Mashao Department of Electrical Engineering, University of Cape Town, Rondebosch, Cape Town, South Africa, frousseau@crg.ee.uct.ac.za, daniel.moshao@ebe.uct.ac.za T