Context Based MTS for Translating Gujarati Trigram and Bigram Idioms to English Jatin C. Modh Research Scholar, Gujarat Technological University, Ahmedabad, Gujarat, India. jatinmodh@yahoo.com Jatinderkumar R. Saini Professor & Director, Symbiosis Institute of Computer Studies and Research, Pune, Maharashtra, India. saini_expert@yahoo.com AbstractGujarati language is the official language of the state of Gujarat located on the western region of India. Machine Translation System (MTS) translates text from one language to other language. Based on our review, we found that very few machine translation systems are available that converts Gujarati text into English language. This paper focuses on the translation of Gujarati trigram idioms. Idiom is defined as a token-sequence whose meaning is different from the literal meaning of the individual tokens. The proposed Gujarati to English Idioms translator accurately translates the trigram and bigram idioms. We have created the corpus of nearly 3000 n-gram idioms and from this corpus we have found nearly 890 trigram idioms and 1735 bigram idioms. This paper studies the translation of trigram and bigram idioms. KeywordsTrigram, English, Gujarati, Idiom, Machine Translation System (MTS). I. INTRODUCTION Gujarati is the native language of the state of Gujarat located in the western region of India. It is one of the official languages recognized by the government of India. Gujarati language is used widely as a medium of communication in Gujarat. Research works involving Natural Language Processing (NLP) of Gujarati language have been presented for diacritic identification [11], information retrieval [12], identification of stop words [13], categorization of stop words [14], Machine Translation System (MTS) for Sanskrit-Gujarati pair [15], comparison of morphologically analyzed words [16], bilingual dictionary implementation [17], constituency mapper [18] and classification [19], to name a few. For automated translation from Gujarati to English, Machine Translation System is needed. The current research work deals with the translation of Gujarati Idioms. Idiom is a word or sequence of words whose collective meaning is not the same as literal meaning of words, but different. It is popularly known as “Rudhiprayog” in the Gujarati language. The meaning of the idiom can be understood by using it in the sentence. The objective of the current work is to make a Machine Translation System that correctly translates the Gujarati Idioms into English language. II. RELATED LITERATURE REVIEW Various Machine Translation projects have been carried out for different pair of languages world-wide. India is a country in which each state has different native language. So there is a huge demand of Machine Translation Systems that translate from one local language to other local language or English language. The scope of this paper is limited to Gujarati to English language pair only. A. Literature Review about Machine Translation Systems: As per literature review [1], we found four Machine Translation Systems namely MANTRA, Google Translate, ANUVADAKSH/EILMT [2] and AnglaBharati-II translate text from English language to Gujarati language [3][10]. Google Translate is only Machine Translation system that also translates text from Gujarati language to English language [1]. Google Translate is a free and well known web-based Machine Translation service, which is developed by Google. It supports translation for more than 100 languages. We have found that for many examples it gives erroneous result for translating text containing Gujarati idioms. B. Literature Review about context identification and idiom translation: Turney defined the context as widely used and ill-defined term. Context is the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood. Contextual features are useful only when they are considered in combination with other features [4]. Sekiya et al. discussed that context for news can be specified by two approaches Bag-of-Words (BOW) and relational information [5]. Dhariya et al. proposed a hybrid approach for MTS using Hindi-English pair. The system proposed by them works with only four types of tenses. Example based methods are proposed for sentences includes idioms. The machine translation segment takes care of the sentences which have the ambiguity. Various rules have been defined for resolving conflicts between proper noun and dictionary word [6]. Mishra et al. proposed hybrid approach for Hindi to English idiom translation by considering three cases of idioms. They proposed three ways of Hindi idiom translation on the base of similar meaning-similar form, similar meaning-dissimilar form and different meaning-different form of idioms in both languages [7]. Salton et al. discussed and evaluated the substitution based idiom translation technique. They found that the method is not the complete solution for the multiword expressions and the system efficiency is dependent on idiom-fixedness only [8]. 2020 International Conference for Emerging Technology (INCET) Belgaum, India. Jun 5-7, 2020 978-1-7281-6221-8/20/$31.00 ©2020 IEEE 1