International Journal of Scientific & Engineering Research Volume 2, Issue 6, June-2011 1 ISSN 2229-5518 IJSER © 2011 http://www.ijser.org Addressing Challenges in Multilingual Machine Translation Prof. Rekha Sugandhi, Sayali Charhate, Anurag Dani, Amol Kawade Abstract- The machine translation process may be unidirectional or bidirectional between a pair of languages. Or it can be multilingual too. A number of software's are developed till date and different advancements are taking place in this field to overcome the language barriers and create borderless marketplace. Still there are many challenges involved in this field of AI which are yet to be overcome. The translation quality of MT systems may be improved by developing better methods as well as by imposing certain restrictions on the input. All sort of challenges peak in case of Multilingual Machine Translation as compared to bilingual one. Paper focuses on long term challenges like High-Quality MT for many more language pairs, training with limited data resources, robustness across domains, genres and language styles, Achieving human-level translation quality and fluency. Index Terms— Machine Learning, Machine Translation (MT), Natural Language Processing (NLP), Word sense disambiguation (WSD), Interlingua, Source Language (SL), Target Language (TL) , Artificial Intelligence. —————————— —————————— 1INTRODUCTION achine Translation (MT) is a sub-field of Artificial Intelligence (AI) which involves automated translation of text from one natural language to another with the help of computer. At basic level, Machine translator performs simple substitution of words in one natural language for words in another, but that alone usually cannot produce a good translation of a text, because recognition of whole phrases and their closest counterparts in the target language is needed. Every natural language has got its own grammatical structure and certain set of rules. Thus, during task of translation, two things should be taken care of, pertain the meaning as of source language (SL) and output must satisfy lexical rules of target language (TL). There are many challenges involved in this field of AI which are yet to be overcome. Multilingual MT mainly suffers from 4 types of ambiguity [13]; Lexical ambiguity, Referential ambiguity, scope ambiguity, structural ambiguity. Lexical and structural ambiguities affect quality of translator the most. Lexical ambiguity arises due to multiple meanings of same word, while structural ambiguity arises due to multiple interpretations of the same sentence. During translation, corpus is processed for a ———————————————— x Rekha S. Sugandhi , working as Assistant Professor at the Department of Computer Engineering, MIT College of Engineering Pune is currently pursuing her Ph.D. in Computer Engineering at the SGB, Amravati University. She has completed her M.Tech in Computer Engineering and B.E. in Computer Engineering from the University of Pune. Her research area and areas of interest include Machine Learning, Natural Language Processing and Theory of Computation. PH-+91-02030273130. E- mail:rekha.sugandhi@gmail.com x Sayali Charhate, is currently pursuing her bachelor degree in Computer Engineering from the University of Pune x Anurag Dani, is currently pursuing his bachelor degree in Computer Engineering from the University of Pune x Amol Kawade is currently pursuing his bachelor degree in Computer Engineering from the University of Pune number of times to perform different operations rather than directly translating it to the TL. In this paper we present overview of challenges involved in these processes and long term challenges in this field. Section 2 focuses on challenges in preprocessing of text i.e. different types of analysis. Section 3 involves challenges during training of machine which affect overall performance and accuracy. Section 4 is about challenges in dealing with multiple languages from different origin and having different characteristics; how these aspects were considered till date. Section 5 comprises of long term challenges followed by some suggestions in section 6. Section 7 contains a proposed model. 2CHALLENGES IN ANALYSIS Analysis of input text from SL consists of morphological analysis, syntactic analysis and semantic analysis. Each kind of analysis poses some challenges in MT. 2.1 In Morphological Analysis There are no generalized grammatical rules in any language [1] which we can use to reduce the size of lexicon. E.g. suffix ‘er’ is used to indicate a person performing an action like the one who ‘Farms’ is a Farmer. But this is not always the case e.g. one who ‘Cooks’ is known as ‘Cook’, one who ‘Drafts’ is called as ‘Draftsman’ and so on. Similarly different forms of a verb are obtained like ‘book’, ‘book-s’, ’book-ing’, ‘book-ed’. But there are many exceptions too like ‘do’, ‘does’, ’doing’, ‘did’, ‘done’ or ‘sing’, ‘sings’, ‘singing’, ‘sang’, ‘sung’. So if we reduce size of lexicon using this approach, it will certainly display the minimal units of grammatical analysis in a vast amount of language data. This technique may also lead to an imperfect attempt to describe something which is too complex [17]. Moreover derivational morphology will tend to change category of the word (POS) unlike the inflectional morphology [1]. As each language varies greatly, even if we find out a rule which is applicable in most of the cases then it will be based on the experience of only one language. Consequently, it will get counter exampled from other M