Syllable Based Transcription of English Words into Perso-Arabic Writing System Jalal Maleki Dept. of Computer and Information Science Linkping University SE-581 83 Linkping Sweden Email: jma@ida.liu.se Abstract This paper presents a rule-based method for transcription of English words into the Perso- Arabic orthography. The method relies on the phonetic representation of English words such as the CMU pronunciation dictionary. Some of the challenging problems are the context-based vowel representation in the Perso-Arabic writing system and the mismatch between the syllabic structures of English and Persian. With some minor extensions, the method can be applied to English to Arabic transliteration as well. 1 Introduction During the translation process from English to Per- sian certain words (usually names and trademarks) are transcribed rather than translated. This is a general issue in machine translation between language pairs. Unfortunately, there are no guidelines as to how these words should be written in the Perso-Arabic Script (PA-Script) and some words are written in more than 10 different ways ([9] ). This paper introduces a rule-base method for English to PA-Script transcrip- tion which is based on the syllable structure of words. Syllables are important since transcription of vowels is mainly determined by the structure of the syllable in which the vowel appears. Given an English word we use a syllabified version of the CMU pronuncia- tion dictionary (CMUPD) to lookup its pronunciation and use it for generating a phonemic romanized Per- sian transcription of the word which is finally resyllab- ified and transcribed into the Perso-Arabic Script (PA- Script) according to the syllabification-based method described in [11]. The romanized scheme we use is the Dabire-romanization described in [10]. Since Arabic and Persian essentially use the same script and have the same syllabic structure, our method can easily be extended to the Arabic script. 2 Phonological Issues The essence of our method is phonological mapping between English and Persian and is defined as phone- mic mapping of consonents and vowels and resyllabi- fication of the source word using Persian syllable con- straints. Just like transliteration between Arabic and English ([2]), transcription between English and Per- sian is a dfficult task. However, although the mapping between the sounds of Persian and english consonants and vowels is non-trivial, the most complicated step is conversion of Persian vowels to PA-Script [11]. 2.1 Consonants Mapping English consonants into Persian phonol- ogy is imperfect but straightforward and it can be sum- marized as a lookup operation. The mapping is how- ever not perfect and in many cases a consonant is mapped into a Persian consonant that only approxi- mately reflects its original pronunciation. For exam- ple, /th/ in ’thanks’ (/TH, AE1, NG, K, S/) is tran- scribed to /t/, whereas, the /th/ of ’that’ (/DH, AE1, T/) is transcribed to Persian /d/. 62/119