Issue Number 17, May 2008 Page 20 Towards Arabic to English Machine Translation Yasser Salem, Arnold Hensman and Brian Nolan School of Informatics and Engineering Institute of Technology Blanchardstown, Dublin, Ireland Emails: {firstname.surname}@itb.ie Abstract This paper explores how the characteristics of the Arabic language will effect the development of a Machine Translation (MT) tool from Arabic to English. Several distinguishing features of Arabic pertinent to MT will be explored in detail with reference to some potential difficulties that they might present. The paper will conclude with a proposed model incorporating the Role and Reference Grammar (RRG) technique to achieve this end. 1 Introduction Arabic is a Semitic language originating in the area presently known as the Arabian Peninsula. It has been spoken in its current form since the 2nd millennium BCE. As a language, Arabic has few irregularities and it is rich in morphological structure. Arabic is also rare in that it is a derivational language rather than concatenative. Words like ’went, go’ – ﺫﻫﺐ ؛ ﻳﺬﻫﺐcan easily be seen as being part of a hierarchy of inheritance from a specific root (in this case ﺫﻫﺐ) In English and in many other languages this is not always the case. The Arabic language is written from right to left. It has 28 letters, many language specific grammar rules and it is a free word order language. Each Arabic letter represents a specific sound so the spelling of words can easily be done phonetically. There is no use of silent letters as in English. Similarly, there is no need to combine letters in Arabic to achieve a certain sound that might be familiar to an English speaker. For example, the ‘th’ sound in English as in the word ‘Thinking’ is reduced in Arabic to the character . In addition to the standard challenges involved in developing an efficient translation tool from Arabic to English, the free word order nature of Arabic creates an obstacle unique to the language. The number of possible clause combinations in basic phrasal structures far exceeds that of most languages. There is no copula verb ‘to be’ in Arabic, resulting in a unique usage of the subject ‘I’. The absence of the indefinite article, while not unique to Arabic still poses many difficulties within the context of the language structure. These and other issues are discussed in later sections. The remainder of this paper is organized in the following manner: Section 2 introduces some common features of Machine Translation and discuses generic problems regardless of language. Section 3 presents the characteristics of the Arabic language. ection 4 will discuss some distinguishing features of Arabic and finally Section 5 will summarize the findings discussed and briefly outline a proposed MT solution.