Vol.:(0123456789) 1 3 International Journal of Speech Technology https://doi.org/10.1007/s10772-018-9499-4 A new hybrid approach for speech synthesis: application to the Arabic language Hanane Tebbi 1 · Maamar Hamadouche 2 · Hamid Azzoune 1 Received: 15 July 2017 / Accepted: 4 March 2018 © Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract This research is part of the automatic speech synthesis (ASS) feld; it addresses a study on the voice production based on a text written in the Arabic language. Our principal purpose is the design of a new hybrid approach that integrates the advan- tages of artifcial intelligence in the feld of ASS using expert systems (ES). We describe the methodology tackled for the approach design, and we present its principal realization steps, which are summarized as follows; (1) the sound base creation based on the elaborated corpus; (2) the linguistic processing, which is responsible for the conversion of the written form of the text to its spoken form; and (3) the acoustic generation corresponding to the pre-acquired Text. The adopted approach is based on a conceptual analysis of the principal steps needed for the design of our speech synthesis ES. Finally, we present the system evaluation report and we explain the obtained results. Keywords Text to speech · Expert system · Knowledge base · Prolog inference engine · Standard Arabic 1 Introduction In recent years, new approaches of synthesis systems by concatenation has emerged. The increase in computer per- formance, in terms of computing speed and quantity of RAM available, has now facilitated the use of large diction- aries (more than 1 h of speech); the unit-selection based approaches, which are essentially static in traditional systems of synthesis by concatenation, becomes dynamic in this new generation of systems. Indeed, in order to limit the size of the dictionary, conventional systems used a single acoustic realization of the same sound unit, which is carefully selected during the dictionary design process, whereas new systems use several acoustic realizations, of the same unit. Unfortu- nately, this technology, in its present state, does not meet all its expectations. It is bridged by at least two major constraints: on the one hand, the heaviness of new voice creation meth- ods, and on the other hand, the limitation of speech having “neutral reading” style, what hampers research in this domain. Actually, it tends to lock the voice catalog and to limit the taking into account of expressive components (Cadic 2011). Therefore, there is a great temptation to use parametric and hybrid synthesis techniques such as Hidden Markov Model (HMM) synthesis, expert systems (ES), or deep neural net- work (DNN), etc. These techniques, nevertheless, are based on machine learning and speech generation models. The aim of this work is the design of a Speech synthesis system using an ES based on an Arabic text. In this work, we introduce our Text-to-Speech system for spoken Arabic based on an ES with the purpose of modeling vocal knowl- edge to build a robust system, which can correctly read a text written in Standard or Dialect Arabic. The input to the system is a text, containing the vocalized Arabic text. The system then, based on the orthographical Phonetic Tran- scription (OPT), transforms the text into phonetic codes and uses the recorded acoustic units to generate continuous speech output. At present, few works exist that have addressed the prob- lem of speech synthesis for the Arabic language using an expert system or any other tool of artifcial intelligence (AI), this is due, in our opinion, to the complexity of the Arabic * Hanane Tebbi htebbi@ushb.dz; tebbi_hanane@yahoo.fr Maamar Hamadouche hamadouchemaamar@yahoo.fr Hamid Azzoune azzoune@yahoo.fr; hazzoune@usthb.dz 1 LRIA, Option: Knowledge Representation and Inference Systems, USTHB, Algiers, Algeria 2 University SAAD Dahleb Blida, Blida, Algeria