Volume 8, No. 5, May-June 2017
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
© 2015-19, IJARCS All Rights Reserved 386
ISSN No. 0976-5697
Implementation of a TTS System for Devanagari Konkani Language using Festival
Nilesh B. Fal Dessai
Department of Computer Science & Technology
Goa University, Taleigao Plateau
Goa, India
Gaurav A. Naik
Info Tech Corporation of Goa Limited
IT Hub, Altinho, Panaji
Goa, India
Jyoti D. Pawar
Department of Computer Science & Technology
Goa University, Taleigao Plateau
Goa, India
Abstract: Text to Speech (TTS) Synthesizer is an application that converts text to speech. Development of speech synthesis system is a
challenging task as the input text may come in an ambiguous form, different words are pronounced in different ways thus requiring efforts
during text pre-processing. This paper discusses the various aspects of Festival and Festvox framework in Linux environment and its use for the
implementation of a TTS system for Devanagari Konkani language. Festival does not provide complete language processing support to various
languages. The experimental results with a text segment of 100 Konkani sentences shows that 64% word phonetization accuracy is obtained thus
indicating scope for improvement in the quality of the output speech if segment of voice used are higher than unit selection voices.
Keywords: TTS; Speech Synthesis; Devanagari; Konkani; Festival; Festvox
I. INTRODUCTION
India is a Multi-lingual country with variety of scripts and
hundreds of spoken dialects. It is desired that information along
with ICT based services are delivered to a large portion of the
population in their own language in the form of voice. Lot of
research work is currently carried out in the area of text to
speech processing for many Indian languages and the synthesis
systems are in great demand for Indian languages. The most
important quality of a speech synthesis system is naturalness
and intelligibility. Naturalness describes how closely the output
sounds like human speech, while intelligibility is the ease with
which the output is understood. An ideal speech synthesizer is
both natural and intelligible and hence speech synthesis
systems usually try to maximize both the characteristics [1].
Intelligibility of the output speech has now reached an adequate
level for most applications, especially for the visually
challenged and illiterate masses.
Konkani, the official language of the State of Goa in India
is also the minority language in the States like Karnataka,
Kerala and Maharashtra in India. Konkani is being spoken by
about 3.6 million people and is written in both Devanagari and
Roman script.
No concrete work is carried out for Konkani in the area of
text to speech. The focus of this work is to study the tools,
resources and techniques for text to speech processing that have
been developed for Indian languages and to implement a TTS
system for Devanagari Konkani using Festival in Linux
environment. Festival is widely used for the implementation of
TTS system for many languages [2] [3] [4].
The paper is organized as follows: Section II and III outline
the TTS system and the synthesis techniques. Section IV, V
and VI details the Festival framework, its implementation for
Konkani and evaluation & discussion of the implemented
system respectively, followed by conclusion at the end
II. GENERAL FUNCTIONAL DIAGRAM OF A TEXT –
TO – SPEECH SYSTEM
The general architecture of a corpus-based TTS system is
depicted in Figure 1. Speech synthesis mainly uses two
processing components; the NLP (Natural Language
Processing) and the DSP (Digital Signal Processing) modules
[5] [6].
Figure 1: The general architecture of a corpus-based TTS
system
This schematic applies for every data driven (i.e. any
corpus-based) TTS system, regardless of the underlying
technology (e.g., unit selection or parametric). The NLP
component accounts for every aspect of the linguistic
processing of the input text, whereas the DSP component
accounts for the speech signal manipulation and the output
generation. For a unit selection TTS, besides the speech units
(usually diphones) the speech database contains all the
necessary data for the unit selection stage of the synthesis.
In particular, the NLP component is mainly responsible for
parsing, analysing and transforming the input text into an