MACHINE TRANSLATION FROM ENGLISH TO ARABIC
Mouiad Alawneh, Nazlia Omar and Tengku Mohd Sembok
Faculty of Information Science and Technology, National University of Malaysia, Bangi , 43600,Malaysia
National University of Malaysia (UKM), National University of Malaysia (UKM),
m_maradona86@yahoo.com
Abstract. Machine Translation has been defined as the process that utilizes computer software to translate
text from one natural language to another. This definition involves accounting for the grammatical structure
of each language and using rules, examples and grammars to transfer the grammatical structure of the source
language (SL) into the target language (TL). This paper presents English to Arabic approach for translating
well-structured English sentences into well-structured Arabic sentences, using a Grammar-based and
example-translation techniques to handle the problems of ordering and agreement. The proposed
methodology is flexible and scalable, the main advantages are: first, a hybrid-based approach combined
advantages of rule-based (RBMT) with advantages example-based (EBMT), and second, it can be applied on
some other languages with minor modifications. The OAK Parser is used to analyze the input English text to
get the part of speech (POS) for each word in the text as a pre-translation process using the C# language,
validation rules have been applied in both the database design and the programming code in order to ensure
the integrity of data. A major design goal of this system is that it will be used as a stand-alone tool, and can
be very well integrated with a general machine translation system for English sentences.
Keywords: MT, Agreement, Word reorder, Rule-Based, Example-based, Hybrid-based OAK, Parser, POS
1. Introduction
The current Machine Translation system facilitates the end user to understand the English textual
sentences clearly by generating the precise corresponding Arabic language. Agreement is a basic property of
language. In the most basic sense, agreement occurs when two elements in the appropriate configuration
exhibit morphology consistent with their co-occurrence. Perhaps the most transparent case of this linguistic
mechanism is number agreement between a subject and a verb: A singular noun in the subject position
regularly co-occurs with a singular verb (e.g., “the dog runs”), and a plural subject noun regularly co-occurs
with a plural verb (e.g., “the dogs run”). If the language has number marking on other elements, such as
determiners or adjectives, these should also exhibit morphology that is consistent with their relationship to
the subject head noun, and this co-occurrence relationship holds for gender and person agreement as well.
The modern Arabic dialects are well-known as having agreement asymmetries that are sensitive to word
order effects. These asymmetries have been attributed to a variety of causes, first, by the analysis problems at
the source language, second, the generation problems at the target languages. However, Arabic is not alone
in showing word-order asymmetries for agreement, Similar asymmetries have been documented in Russian,
Hindi, Slovene, French and Italian (Hutchins and Somers 1992). Languages are varied in the agreement
requirements. Some of them like Arabic require number, gender, person, and case agreements while others
need some of these agreements. Machine translation system develops by using four approaches depending on
their difficulty and complexity. These approaches are: rule based, knowledge-based, corpus-based and
hybrid MT, Rule-based machine translation approaches can be classified into the following categories: direct
machine translation, interlingua machine translation and transfer based machine translation (Abu Shquier and
Sembok, 2008).Our purpose of this paper is to design a hybrid-based (rule-based and example-based)
framework based hence, to strike a balance between both approaches in the use of MT for the translation of
2011 International Conference on Biomedical Engineering and Technology
IPCBEE vol.11 (2011) © (2011) IACSIT Press, Singapore
95