Bengali to Assamese Statistical Machine Translation using Moses (Corpus Based) Nayan Jyoti Kalita 1 , Baharul Islam 2 1 Department of CSE, Royal School of Engineering and Technology 2 Department of IT, Gauhati University Guwahati, India {nayan.jk.123, islambaharul65}@gmail.com Abstract—Machine dialect interpretation assumes a real part in encouraging man-machine correspondence and in addition men-men correspondence in Natural Language Processing (NLP). Machine Translation (MT) alludes to utilizing machine to change one dialect to an alternate. Statistical Machine Translation is a type of MT consisting of Language Model (LM), Translation Model (TM) and decoder. In this paper, Bengali to Assamese Statistical Machine Translation Model has been created by utilizing Moses. Other translation tools like IRSTLM for Language Model and GIZA-PP-V1.0.7 for Translation model are utilized within this framework which is accessible in Linux situations. The purpose of the LM is to encourage fluent output and the purpose of TM is to encourage similarity between input and output, the decoder increases the probability of translated text in target language. A parallel corpus of 17100 sentences in Bengali and Assamese has been utilized for preparing within this framework. Measurable MT procedures have not so far been generally investigated for Indian dialects. It might be intriguing to discover to what degree these models can help the immense continuous MT deliberations in the nation. I. INTRODUCTION Multilingualism is considered to be a part of democracy. With increasing growth of technology language barrier should not be a problem. It becomes important to provide information to people as and when needed as well as their native language. Machine translation is not primarily an area of abstract intellectual inquiry but the application of computer and language sciences to the development of system answering practical needs. The focus of the research presented here was to investigate the effectiveness of a phrase based statistical Bengali-Assamese translation using the Moses toolkit. The field of common dialect handling (NLP) started give or take five decades prior with machine interpretation frameworks. In 1946, Warren Weaver and Andrew Donald Booth examined the specialized attainability of machine interpretation "by method for the methods created throughout World War II for the breaking of adversary codes" [1]. Throughout the more than fifty years of its presence, the field has developed from the lexicon based machine interpretation frameworks of the fifties to the more versatile, powerful, and easy to use NLP situations of the nineties. Machine Interpretation Machine interpretation is the name for modernized systems that mechanize all or some piece of the procedure of making an interpretation of starting with one dialect then onto the next. In a huge multilingual public opinion like India, there is incredible interest for interpretation of records starting with one language then onto the next language. There are 22 intrinsically sanction languages, which are authoritatively utilized as a part of distinctive states. There are something like 1650 tongues talked by distinctive groups. There are 10 Indict scripts. These dialects are overall created and rich in substance. They have comparative scripts and sentence structures. The alphabetic request is likewise comparable. A few dialects use regular script, particularly Devanagari. Hindi composed in the Devanagari script is the official language of the Government of India. English is likewise utilized for government notices and interchanges. India's normal writing proficiency level is 65.4 percent (Census 2001). Short of what 5 percent of individuals can either read or compose English. As the vast majority of the state government works in commonplace dialects although the focal government‟s authority reports and reports are in English or Hindi, these records are to be deciphered into the particular common dialects to have a fitting correspondence with the individuals. Work in the region of Machine Translation in India has been continuing for a few decades. Throughout the early 90s, propelled research in the field of Artificial Intelligence and Computational Linguistics made a guaranteeing advancement of interpretation innovation. This aided in the improvement of usable Machine Translation Systems in certain decently characterized spaces. Since 1990, Scrutinize on MT frameworks between Indian and outside dialects and additionally between Indian dialects are going ahead in different organizations. Interpretation between structurally comparative dialects like Hindi and Punjabi is simpler than that between dialect matches that have wide structural distinction like Hindi and English. Interpretation frameworks between nearly related dialects are less demanding to create since they have numerous parts of their linguistic uses and vocabularies in like manner [2]. The organization of the paper is as follows. Section II gives an outline on the Assamese and Bengali language. Section III describes the related work on the machine translations. Section IV gives an outline on machine translation as well as on statistical machine translation. In section V, the design and implementation of the system has been discussed. Section VI gives the results obtained from our experiment. Section VII concludes the report.