International Journal of Scientific & Engineering Research Volume 2, Issue 6, June-2011 1
ISSN 2229-5518
IJSER © 2011
http://www.ijser.org
Addressing Challenges in Multilingual Machine
Translation
Prof. Rekha Sugandhi, Sayali Charhate, Anurag Dani, Amol Kawade
Abstract- The machine translation process may be unidirectional or bidirectional between a pair of languages. Or it can be multilingual too.
A number of software's are developed till date and different advancements are taking place in this field to overcome the language barriers
and create borderless marketplace. Still there are many challenges involved in this field of AI which are yet to be overcome. The translation
quality of MT systems may be improved by developing better methods as well as by imposing certain restrictions on the input. All sort of
challenges peak in case of Multilingual Machine Translation as compared to bilingual one. Paper focuses on long term challenges like
High-Quality MT for many more language pairs, training with limited data resources, robustness across domains, genres and language
styles, Achieving human-level translation quality and fluency.
Index Terms— Machine Learning, Machine Translation (MT), Natural Language Processing (NLP), Word sense disambiguation (WSD),
Interlingua, Source Language (SL), Target Language (TL) , Artificial Intelligence.
—————————— ——————————
1INTRODUCTION
achine Translation (MT) is a sub-field of Artificial
Intelligence (AI) which involves automated
translation of text from one natural language to another
with the help of computer. At basic level, Machine
translator performs simple substitution of words in one
natural language for words in another, but that alone
usually cannot produce a good translation of a text, because
recognition of whole phrases and their closest counterparts
in the target language is needed. Every natural language
has got its own grammatical structure and certain set of
rules. Thus, during task of translation, two things should be
taken care of, pertain the meaning as of source language
(SL) and output must satisfy lexical rules of target language
(TL).
There are many challenges involved in this field of AI
which are yet to be overcome. Multilingual MT mainly
suffers from 4 types of ambiguity [13]; Lexical ambiguity,
Referential ambiguity, scope ambiguity, structural
ambiguity. Lexical and structural ambiguities affect quality
of translator the most. Lexical ambiguity arises due to
multiple meanings of same word, while structural
ambiguity arises due to multiple interpretations of the same
sentence. During translation, corpus is processed for a
————————————————
x Rekha S. Sugandhi , working as Assistant Professor at the Department of
Computer Engineering, MIT College of Engineering Pune is currently
pursuing her Ph.D. in Computer Engineering at the SGB, Amravati
University. She has completed her M.Tech in Computer Engineering and
B.E. in Computer Engineering from the University of Pune. Her research
area and areas of interest include Machine Learning, Natural Language
Processing and Theory of Computation. PH-+91-02030273130. E-
mail:rekha.sugandhi@gmail.com
x Sayali Charhate, is currently pursuing her bachelor degree in Computer
Engineering from the University of Pune
x Anurag Dani, is currently pursuing his bachelor degree in Computer
Engineering from the University of Pune
x Amol Kawade is currently pursuing his bachelor degree in Computer
Engineering from the University of Pune
number of times to perform different operations rather than
directly translating it to the TL. In this paper we present
overview of challenges involved in these processes and
long term challenges in this field.
Section 2 focuses on challenges in preprocessing of text
i.e. different types of analysis. Section 3 involves challenges
during training of machine which affect overall
performance and accuracy. Section 4 is about challenges in
dealing with multiple languages from different origin and
having different characteristics; how these aspects were
considered till date. Section 5 comprises of long term
challenges followed by some suggestions in section 6.
Section 7 contains a proposed model.
2CHALLENGES IN ANALYSIS
Analysis of input text from SL consists of morphological
analysis, syntactic analysis and semantic analysis. Each
kind of analysis poses some challenges in MT.
2.1 In Morphological Analysis
There are no generalized grammatical rules in any
language [1] which we can use to reduce the size of lexicon.
E.g. suffix ‘er’ is used to indicate a person performing an
action like the one who ‘Farms’ is a Farmer. But this is not
always the case e.g. one who ‘Cooks’ is known as ‘Cook’,
one who ‘Drafts’ is called as ‘Draftsman’ and so on.
Similarly different forms of a verb are obtained like ‘book’,
‘book-s’, ’book-ing’, ‘book-ed’. But there are many
exceptions too like ‘do’, ‘does’, ’doing’, ‘did’, ‘done’ or
‘sing’, ‘sings’, ‘singing’, ‘sang’, ‘sung’. So if we reduce size
of lexicon using this approach, it will certainly display the
minimal units of grammatical analysis in a vast amount of
language data. This technique may also lead to an imperfect
attempt to describe something which is too complex [17].
Moreover derivational morphology will tend to change
category of the word (POS) unlike the inflectional
morphology [1]. As each language varies greatly, even if we
find out a rule which is applicable in most of the cases then
it will be based on the experience of only one language.
Consequently, it will get counter exampled from other
M