International Journal on Islamic Applications in Computer Science And Technology, Vol. 4, Issue 2, June 2016, 36-41 Towards A Minimal Phonetic Set for Quran Recitation Husni A. Al-Muhtaseb 1 , Sameh A. Bellegdi 2 1 Information and Computer Science Department, KFUPM, Dhahran 31261, Saudi Arabia 2 Academic Leadership Center, Ministry of Education, Saudi Arabia 1 muhtaseb@kfupm.edu.sa, 2 bellegdi@kfupm.edu.sa Abstract Speech is the most important interaction mechanism between human beings. Text-to-Speech synthesis problem has been addressed by many researchers in the literature for different languages. However, the Arabic language did not receive that much attention. This paper addresses a computational linguistic aspect in a phonetically transcribed syllabified Quranic text that is essential for developing speech synthesis prototype. The main objective of this work is to find a set of Quran verses (Ayat) that has the complete set of distinct syllables. An algorithm to find a reduced set for Quran verses that contains all Quran syllables is proposed. One of the motivations for this work is compressing the sound files of the Quran recitation. The current work proposes a technique to extract a reduced phonetic set of Quran recitation that can be used to develop Text-to-Speech system. It is found that out of 211,573 syllables – which Quran consists of – there are 2,642 distinct syllables that represents less than 1.25% of the Quranic syllables set. In addition, a reduced set of Quran verses that contains the whole set of distinct syllables is identified. The extracted set of verses represents around 16% of Quran verses. Keywords: Quranic text; phonetic set; Quran recitation; reduced Quranic phonetic set; text statistics; Quran recitation synthesis 1. Introduction Text-to-Speech synthesis (TTS) research field received a lot of attention from the research community. However, only limited research work addressed the problem of Arabic speech synthesis. One of the main motivations for this work is to compress the size of the sound files of the Quran recitation. TTS has mainly two phases to transform text to speech, viz. text analysis and speech signal generation. Figure 1 shows four modules in TTS system: text analysis, phonetic analysis, prosodic analysis, and speech synthesis. Text analysis includes preprocessing steps to replace numbers and abbreviations by their corresponding words or phrases. Phonetic analysis – phonetic transcription or grapheme-to-phoneme conversion – transforms text to phonemes. Prosodic analysis module is responsible for attaching stress and intonation features. Speech synthesis module is responsible for generating speech (Dutoit, 1997). The work presented in this paper deals with the phonetic analysis phase. Figure 1. TTS Modules