© 2015, IJARCSSE All Rights Reserved Page | 475 Volume 5, Issue 10, October-2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review of Unit Selection Speech Synthesis Sangramsing Kayte Department of Computer Science & Information Technology Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India Monica Mundada Department of Computer Science & Information Technology Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India Dr. Charansing Kayte Assistant Professor Department of Digital and Cyber Forensic, Aurangabad Maharashtra, India Abstract— Speech is used to express information, emotions, and feelings. Speech synthesis is the technique of converting given input text to synthetic speech. Speech synthesis can be used to read text as in SMS, newspapers, site information etc. and can be used by blind people. Speech synthesis has been widely researched in last four decades. The quality and intelligibility of the synthetic speech produced is remarkably good for most of the applications. This report intends to review four majorly researched methods of speech synthesis viz. Articulatory, Concatenated, Formant, and Quasi-articulatory Synthesis. Mainly in this paper focus is given on concatenate synthesis method and some issues of this method are discussed. Articulatory Synthesis is based on human speech production model. The synthetic speech produced by this model is most natural, but it is also the most difficult method. Concatenate Synthesis uses prerecorded speech words, phrases and concatenates them to produce sound. It is the simplest method and yields high-quality speech but is limited by its memory requirement to store beforehand all possible words, phrases to be produced. Formant Synthesis is based on the acoustic model of the human speech production system. It models the sound source and the resonance in the vocal tract, and is most common model used. Quasi-articulatory Synthesis is a hybrid of articulator acoustic model of speech production. Synthetic speech produced by this model sounds more natural and can be easily customized to meet different requirements of different applications and individual users. Keywords— Unit selection Speech synthesis, articulatory synthesizer, formant synthesizer, concatenative synthesizer. I. INTRODUCTION Unit selection synthesis is also referred as corpus based synthesis. It uses large database. During database creation, each recorded utterance is segmented into some individual phones, syllables, morphemes, words, phrases, and sentences. An index of the units in the speech database is then made based on the segmentation and acoustic parameters such as fundamental frequency, pitch, duration, the status of the syllable and previous and next phones. This method provides naturalness in output speech as compared to other techniques. Speech synthesis is a process of automatic generation of speech by machines/computers. The goal of speech synthesis is to develop a machine having an intelligible, natural sounding voice for conveying information to a user in a desired accent, language, and voice. Unit selection synthesis shown in Fig.1 is a type of concatenative synthesis in which the largest matching sound file available in the speech corpus is concatenated for synthesis of target speech. It is capable of managing large number of units [1], also imparts prosody beyond the role of F0. It is quite necessary to make a clear distinction between role of F0 and Pitch: F0 is the actual frequency generated by the vocal cord or vocal fold, while Pitch is the perception of that frequency by the listener. Hence it not necessary that both are equal.This synthesis technique also retains the naturalness in the speech sounds being generated. Choosing unit length is an important task in Concatenative speech synthesis. A shorter unit length requires less spacebut sample collecting and labeling becomes more difficult and complex. A longer unit length gives more naturalness [2], better coarticulation effect and less concatenation points but requires more memory space. Choices of unitfor TTS are phonemes, diphones, triphones, demi syllables, syllables and words [3][4]. Fig. 1 Unit Selection Synthesis system Unit-selection speech synthesis has become increasingly popular due to its enhanced prosodic quality and naturalness when compared to parametric or diphone synthesizers. The principle is based on the concatenation of naturally-produced