Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info © 2015-19, IJARCS All Rights Reserved 386 ISSN No. 0976-5697 Implementation of a TTS System for Devanagari Konkani Language using Festival Nilesh B. Fal Dessai Department of Computer Science & Technology Goa University, Taleigao Plateau Goa, India Gaurav A. Naik Info Tech Corporation of Goa Limited IT Hub, Altinho, Panaji Goa, India Jyoti D. Pawar Department of Computer Science & Technology Goa University, Taleigao Plateau Goa, India Abstract: Text to Speech (TTS) Synthesizer is an application that converts text to speech. Development of speech synthesis system is a challenging task as the input text may come in an ambiguous form, different words are pronounced in different ways thus requiring efforts during text pre-processing. This paper discusses the various aspects of Festival and Festvox framework in Linux environment and its use for the implementation of a TTS system for Devanagari Konkani language. Festival does not provide complete language processing support to various languages. The experimental results with a text segment of 100 Konkani sentences shows that 64% word phonetization accuracy is obtained thus indicating scope for improvement in the quality of the output speech if segment of voice used are higher than unit selection voices. Keywords: TTS; Speech Synthesis; Devanagari; Konkani; Festival; Festvox I. INTRODUCTION India is a Multi-lingual country with variety of scripts and hundreds of spoken dialects. It is desired that information along with ICT based services are delivered to a large portion of the population in their own language in the form of voice. Lot of research work is currently carried out in the area of text to speech processing for many Indian languages and the synthesis systems are in great demand for Indian languages. The most important quality of a speech synthesis system is naturalness and intelligibility. Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood. An ideal speech synthesizer is both natural and intelligible and hence speech synthesis systems usually try to maximize both the characteristics [1]. Intelligibility of the output speech has now reached an adequate level for most applications, especially for the visually challenged and illiterate masses. Konkani, the official language of the State of Goa in India is also the minority language in the States like Karnataka, Kerala and Maharashtra in India. Konkani is being spoken by about 3.6 million people and is written in both Devanagari and Roman script. No concrete work is carried out for Konkani in the area of text to speech. The focus of this work is to study the tools, resources and techniques for text to speech processing that have been developed for Indian languages and to implement a TTS system for Devanagari Konkani using Festival in Linux environment. Festival is widely used for the implementation of TTS system for many languages [2] [3] [4]. The paper is organized as follows: Section II and III outline the TTS system and the synthesis techniques. Section IV, V and VI details the Festival framework, its implementation for Konkani and evaluation & discussion of the implemented system respectively, followed by conclusion at the end II. GENERAL FUNCTIONAL DIAGRAM OF A TEXT TO SPEECH SYSTEM The general architecture of a corpus-based TTS system is depicted in Figure 1. Speech synthesis mainly uses two processing components; the NLP (Natural Language Processing) and the DSP (Digital Signal Processing) modules [5] [6]. Figure 1: The general architecture of a corpus-based TTS system This schematic applies for every data driven (i.e. any corpus-based) TTS system, regardless of the underlying technology (e.g., unit selection or parametric). The NLP component accounts for every aspect of the linguistic processing of the input text, whereas the DSP component accounts for the speech signal manipulation and the output generation. For a unit selection TTS, besides the speech units (usually diphones) the speech database contains all the necessary data for the unit selection stage of the synthesis. In particular, the NLP component is mainly responsible for parsing, analysing and transforming the input text into an