A MANUAL SYSTEM TO SEGMENT AND TRANSCRIBE ARABIC SPEECH M. Alghamdi 1 , Y. O. Mohamed El Hadj 2 , M. Alkanhal 1 1 Email: {mgamdi,mkanhal}@kacst.edu.sa King Abdulaziz City for Science and Technology PO Box 6086, Riyadh 11442, Saudi Arabia 2 Email: yelhadj@ccis.imamu.edu.sa Imam Med Bin Saud Islamic University PO Box 8488, Riyadh 11681, Saudi Arabia ABSTRACT In this paper, we present our first work in the "Computerized Teaching of the Holly Quran" project, which aims to assist the memorization process of the Noble Quran based-on the speech recognition techniques. In order to build a high performance speech recognition system for this purpose, accurate acoustic models are essentials. Since annotated speech corpus of the Quranic sounds was not available yet, we tried to collect speech data from reciters memorizing the Quran and then focusing on their labeling and segmentation. It was necessarily, to propose a new labeling scheme which is able to cover all the Quranic Sounds and its phonological variations. In this paper, we present a set of labels that cover all the Arabic phonemes and their allophones and then show how it can be efficiently used to segment our Quranic corpus. Index Terms— Quran; Arabic; transcription; speech; recognition 1. INTRODUCTION Human machine interaction is switching from buttons and screens to speech. Speech recognition is an important element in this interaction. However, to build a speech recognition system a speech database is needed. A speech database is essential not only to build a speech recognition system but also to build other systems such as speaker verification and speech syntheses. This is one of the reasons that speech databases have been collected for many languages, for example: English [1], Spanish [2], Dutch [3], Mandarin [4], French [5] and Arabic [6] among others. Although recited Quran is not used in communication, it is important in teaching the pronunciation of Classical Arabic sounds in addition to the fact that it is indispensable in Islamic worshiping such as prayers. Teaching how to recite the Quran has been through teachers who pronounce the Quranic sounds accurately. Such method has been practiced since the revelation of the Quran. This paper is part of a project to build a speech recognition system that would be able to teach learners how to pronounce its sounds and correct them when they make mistakes. However, before building the system a speech database of the recited Quran is needed where the sounds are labeled and segmented. Recent speech databases possess transcription at different levels. These levels range from the phonemes to intonations. In addition to transcribing the speech, the transcription is aligned with the speech acoustic signal [7, 8]. The transcription and alignment can be done manually, automatically or both where the manual transcription is done for verification of the automatic transcription [7, 9]. This paper presents a new transcription labels that are more convenient to the transcribers and appropriate for speech recognition tools such as Hidden Markov Toolkit (HTK) [10]. At the same, they cover all Arabic sounds including that of the Modern Standard Arabic, Arabic dialects and Classical Arabic. 2. SOUND LABLES The appropriate symbols for accurate speech transcription are those of the International Phonetic Alphabet (IPA) for the fact that they represent the speech sounds of all languages and their dialects [11]. However, they are not familiarly used in speech databases for the reason that most language programs and speech tools such as Hidden Markov Toolkit do not recognize them. On the other hand, language orthography does not represent all the sound of its language, therefore, it is not used by itself for transcription. So, other symbols available on the keyboard are used for transcription such as @, >, in addition, combinations of two characters such