International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249-8958 (Online), Volume-9 Issue-2, December, 2019
2940
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: B2320129219/2019©BEIESP
DOI: 10.35940/ijeat.B2320.129219
Journal Website: www.ijeat.org
Abstract: To bridge the language constraint of the people
residing in northeastern region of India, machine translation
system is a necessity. Large number of people in this region
cannot access many services due to the language
incomprehensibility. Among several languages spoken, Assamese
is one of the major languages used in northeast India. Machine
translation for Assamese language is limited compared to other
languages. As a result, large number of people using Assamese
language cannot avail lots of benefits associated with it. This
paper has focused on the development of the English to Assamese
translation system using n-gram model. The n-gram model works
very well with the language pair having high dissimilarity in
syntax compared to other models. The value of n has a very big
role in the quality and efficiency of the system. Bilingual
Evaluation Understudy (BLEU) score differs significantly with
the change of the n-gram. This model uses tuples to reduce the
consumption of excess memory and to accelerate the translation
process. Parallel corpus has been used for training the n-gram
based decoder called MARIE. The number of translation units
extracted using n-gram model is much less than the translation
units extracted using phrase based model. This has a high impact
on system efficiency.
Keywords: Statistical Machine Translation, N-gram, MARIE,
English-Assamese Translation, Tuple Extraction
I. INTRODUCTION
Machine translation is a subfield of computer linguistics
that deals with the translation carried out by the computers. In
the context of text translation, user inputs texts of one
standard language and the system translates the same into the
texts of another standard language. During this process, the
system needs to follow some rules specifically applicable to
the target language. Since the syntax of different languages is
different, the system should understand the rules of different
languages. The position of noun, verb and object of different
languages are different; hence, it should be taken care of such
that the sentences are translated with proper meaning without
violation of grammatical rules. The quality of the translation
is measured on the basis of some factors like handling the
linguistic typology, translation of the idioms and how the
Revised Manuscript Received on December 30, 2019.
* Correspondence Author
Zakir Hussain*, Research Scholar, Department of CSE, NIT Silchar,
Assam, India.
Malaya Dutta Borah, Assistant Professor, Department of CSE, NIT
Silchar, Assam, India.
Abdul Hannan, Faculty, Department of IT, Gauhati University, Assam,
India.
© The Authors. Published by Blue Eyes Intelligence Engineering and
Sciences Publication (BEIESP). This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
anomalies are isolated. Now-a-days, machine translation has
got much importance in different fields. Earlier, language
was a very big barrier in sharing ideas and knowledge
between diverse language speakers. Machine translation has
attracted various domain of work due to its ability of bridging
the gap created by that barrier. The significant demand for
translation of electronic text on the internet, such as web
pages, e-mail, social media, electronic chat, official
document translation etc. is being noticed from past few
years. To fulfill the need of translation, the following well
defined approaches are referred by the research community
in the area of machine translation (Please refer to Fig.1 for
pictorial view of the approaches).
• Direct approach: The direct approach is the simplest
one. Translation is done word-by-word basis. In this
approach, no linguistic analysis of the source sentence is
taken into consideration for producing a target sentence.
Now-a-days this approach has been abandoned even in the
corpus-based framework.
• Rule based approach: This approach is based on the
rules generated by the human experts [11], [17]. Different
human experts may specify different rules for translation
process. Hence, for different persons, the system will be of
different configuration and of different efficiency. This
approach can be subdivided into:
- Transfer approach: The transfer approach has
three phases. First one is the analysis phase where analysis of
the source sentence is done to produce an abstract
representation. Second one is the transfer phase where
abstract representation of the first phase is being transferred
into the equivalent representation in the target language.
Third one is the generation phase where target sentence is
generated from the intermediary representation.
- Interlingua approach: This approach produces the
target sentence based on the Interlingua representation
created by thorough analysis of the syntax and semantics of
the source sentence. This approach analyses the source
sentence deeply. The advantage of Interlingua approach is
that once the meaning of the source sentence is grasped, it can
be articulated in any number of target languages.
• Corpus based approach: This type of system
extracts the knowledge by analyzing the translation examples
from parallel corpus. A parallel corpus is a collection of texts
of more than one language, each of which is an exact
translation of each other. The parallel corpus is developed by
human experts. Here, the translation system can be developed
as soon as the required technique is ready for a given pair of
languages. A corpus-based approach generally follows direct
or transfer approach. The Corpus
based approach can be further
N-gram based Machine Translation for
English-Assamese: Two Languages with High
Syntactical Dissimilarity
Zakir Hussain, Malaya Dutta Borah, Abdul Hannan