Vol.:(0123456789) 1 3
International Journal of Speech Technology
https://doi.org/10.1007/s10772-018-9499-4
A new hybrid approach for speech synthesis: application to the Arabic
language
Hanane Tebbi
1
· Maamar Hamadouche
2
· Hamid Azzoune
1
Received: 15 July 2017 / Accepted: 4 March 2018
© Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract
This research is part of the automatic speech synthesis (ASS) feld; it addresses a study on the voice production based on a
text written in the Arabic language. Our principal purpose is the design of a new hybrid approach that integrates the advan-
tages of artifcial intelligence in the feld of ASS using expert systems (ES). We describe the methodology tackled for the
approach design, and we present its principal realization steps, which are summarized as follows; (1) the sound base creation
based on the elaborated corpus; (2) the linguistic processing, which is responsible for the conversion of the written form of
the text to its spoken form; and (3) the acoustic generation corresponding to the pre-acquired Text. The adopted approach is
based on a conceptual analysis of the principal steps needed for the design of our speech synthesis ES. Finally, we present
the system evaluation report and we explain the obtained results.
Keywords Text to speech · Expert system · Knowledge base · Prolog inference engine · Standard Arabic
1 Introduction
In recent years, new approaches of synthesis systems by
concatenation has emerged. The increase in computer per-
formance, in terms of computing speed and quantity of
RAM available, has now facilitated the use of large diction-
aries (more than 1 h of speech); the unit-selection based
approaches, which are essentially static in traditional systems
of synthesis by concatenation, becomes dynamic in this new
generation of systems. Indeed, in order to limit the size of
the dictionary, conventional systems used a single acoustic
realization of the same sound unit, which is carefully selected
during the dictionary design process, whereas new systems
use several acoustic realizations, of the same unit. Unfortu-
nately, this technology, in its present state, does not meet all
its expectations. It is bridged by at least two major constraints:
on the one hand, the heaviness of new voice creation meth-
ods, and on the other hand, the limitation of speech having
“neutral reading” style, what hampers research in this domain.
Actually, it tends to lock the voice catalog and to limit the
taking into account of expressive components (Cadic 2011).
Therefore, there is a great temptation to use parametric and
hybrid synthesis techniques such as Hidden Markov Model
(HMM) synthesis, expert systems (ES), or deep neural net-
work (DNN), etc. These techniques, nevertheless, are based
on machine learning and speech generation models.
The aim of this work is the design of a Speech synthesis
system using an ES based on an Arabic text. In this work,
we introduce our Text-to-Speech system for spoken Arabic
based on an ES with the purpose of modeling vocal knowl-
edge to build a robust system, which can correctly read a
text written in Standard or Dialect Arabic. The input to the
system is a text, containing the vocalized Arabic text. The
system then, based on the orthographical Phonetic Tran-
scription (OPT), transforms the text into phonetic codes
and uses the recorded acoustic units to generate continuous
speech output.
At present, few works exist that have addressed the prob-
lem of speech synthesis for the Arabic language using an
expert system or any other tool of artifcial intelligence (AI),
this is due, in our opinion, to the complexity of the Arabic
* Hanane Tebbi
htebbi@ushb.dz; tebbi_hanane@yahoo.fr
Maamar Hamadouche
hamadouchemaamar@yahoo.fr
Hamid Azzoune
azzoune@yahoo.fr; hazzoune@usthb.dz
1
LRIA, Option: Knowledge Representation and Inference
Systems, USTHB, Algiers, Algeria
2
University SAAD Dahleb Blida, Blida, Algeria