International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249-8958 (Online), Volume-9 Issue-2, December, 2019 2940 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: B2320129219/2019©BEIESP DOI: 10.35940/ijeat.B2320.129219 Journal Website: www.ijeat.org Abstract: To bridge the language constraint of the people residing in northeastern region of India, machine translation system is a necessity. Large number of people in this region cannot access many services due to the language incomprehensibility. Among several languages spoken, Assamese is one of the major languages used in northeast India. Machine translation for Assamese language is limited compared to other languages. As a result, large number of people using Assamese language cannot avail lots of benefits associated with it. This paper has focused on the development of the English to Assamese translation system using n-gram model. The n-gram model works very well with the language pair having high dissimilarity in syntax compared to other models. The value of n has a very big role in the quality and efficiency of the system. Bilingual Evaluation Understudy (BLEU) score differs significantly with the change of the n-gram. This model uses tuples to reduce the consumption of excess memory and to accelerate the translation process. Parallel corpus has been used for training the n-gram based decoder called MARIE. The number of translation units extracted using n-gram model is much less than the translation units extracted using phrase based model. This has a high impact on system efficiency. Keywords: Statistical Machine Translation, N-gram, MARIE, English-Assamese Translation, Tuple Extraction I. INTRODUCTION Machine translation is a subfield of computer linguistics that deals with the translation carried out by the computers. In the context of text translation, user inputs texts of one standard language and the system translates the same into the texts of another standard language. During this process, the system needs to follow some rules specifically applicable to the target language. Since the syntax of different languages is different, the system should understand the rules of different languages. The position of noun, verb and object of different languages are different; hence, it should be taken care of such that the sentences are translated with proper meaning without violation of grammatical rules. The quality of the translation is measured on the basis of some factors like handling the linguistic typology, translation of the idioms and how the Revised Manuscript Received on December 30, 2019. * Correspondence Author Zakir Hussain*, Research Scholar, Department of CSE, NIT Silchar, Assam, India. Malaya Dutta Borah, Assistant Professor, Department of CSE, NIT Silchar, Assam, India. Abdul Hannan, Faculty, Department of IT, Gauhati University, Assam, India. © The Authors. Published by Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) anomalies are isolated. Now-a-days, machine translation has got much importance in different fields. Earlier, language was a very big barrier in sharing ideas and knowledge between diverse language speakers. Machine translation has attracted various domain of work due to its ability of bridging the gap created by that barrier. The significant demand for translation of electronic text on the internet, such as web pages, e-mail, social media, electronic chat, official document translation etc. is being noticed from past few years. To fulfill the need of translation, the following well defined approaches are referred by the research community in the area of machine translation (Please refer to Fig.1 for pictorial view of the approaches). • Direct approach: The direct approach is the simplest one. Translation is done word-by-word basis. In this approach, no linguistic analysis of the source sentence is taken into consideration for producing a target sentence. Now-a-days this approach has been abandoned even in the corpus-based framework. • Rule based approach: This approach is based on the rules generated by the human experts [11], [17]. Different human experts may specify different rules for translation process. Hence, for different persons, the system will be of different configuration and of different efficiency. This approach can be subdivided into: - Transfer approach: The transfer approach has three phases. First one is the analysis phase where analysis of the source sentence is done to produce an abstract representation. Second one is the transfer phase where abstract representation of the first phase is being transferred into the equivalent representation in the target language. Third one is the generation phase where target sentence is generated from the intermediary representation. - Interlingua approach: This approach produces the target sentence based on the Interlingua representation created by thorough analysis of the syntax and semantics of the source sentence. This approach analyses the source sentence deeply. The advantage of Interlingua approach is that once the meaning of the source sentence is grasped, it can be articulated in any number of target languages. • Corpus based approach: This type of system extracts the knowledge by analyzing the translation examples from parallel corpus. A parallel corpus is a collection of texts of more than one language, each of which is an exact translation of each other. The parallel corpus is developed by human experts. Here, the translation system can be developed as soon as the required technique is ready for a given pair of languages. A corpus-based approach generally follows direct or transfer approach. The Corpus based approach can be further N-gram based Machine Translation for English-Assamese: Two Languages with High Syntactical Dissimilarity Zakir Hussain, Malaya Dutta Borah, Abdul Hannan