Context Based MTS for Translating Gujarati
Trigram and Bigram Idioms to English
Jatin C. Modh
Research Scholar,
Gujarat Technological University,
Ahmedabad, Gujarat, India.
jatinmodh@yahoo.com
Jatinderkumar R. Saini
Professor & Director,
Symbiosis Institute of Computer Studies and Research,
Pune, Maharashtra, India.
saini_expert@yahoo.com
Abstract—Gujarati language is the official language of the
state of Gujarat located on the western region of India.
Machine Translation System (MTS) translates text from one
language to other language. Based on our review, we found that
very few machine translation systems are available that
converts Gujarati text into English language. This paper
focuses on the translation of Gujarati trigram idioms. Idiom is
defined as a token-sequence whose meaning is different from
the literal meaning of the individual tokens. The proposed
Gujarati to English Idioms translator accurately translates the
trigram and bigram idioms. We have created the corpus of
nearly 3000 n-gram idioms and from this corpus we have found
nearly 890 trigram idioms and 1735 bigram idioms. This paper
studies the translation of trigram and bigram idioms.
Keywords—Trigram, English, Gujarati, Idiom, Machine
Translation System (MTS).
I. INTRODUCTION
Gujarati is the native language of the state of Gujarat
located in the western region of India. It is one of the official
languages recognized by the government of India. Gujarati
language is used widely as a medium of communication in
Gujarat. Research works involving Natural Language
Processing (NLP) of Gujarati language have been presented
for diacritic identification [11], information retrieval [12],
identification of stop words [13], categorization of stop
words [14], Machine Translation System (MTS) for
Sanskrit-Gujarati pair [15], comparison of morphologically
analyzed words [16], bilingual dictionary implementation
[17], constituency mapper [18] and classification [19], to
name a few. For automated translation from Gujarati to
English, Machine Translation System is needed.
The current research work deals with the translation of
Gujarati Idioms. Idiom is a word or sequence of words
whose collective meaning is not the same as literal meaning
of words, but different. It is popularly known as
“Rudhiprayog” in the Gujarati language. The meaning of the
idiom can be understood by using it in the sentence. The
objective of the current work is to make a Machine
Translation System that correctly translates the Gujarati
Idioms into English language.
II. RELATED LITERATURE REVIEW
Various Machine Translation projects have been carried
out for different pair of languages world-wide. India is a
country in which each state has different native language. So
there is a huge demand of Machine Translation Systems that
translate from one local language to other local language or
English language. The scope of this paper is limited to
Gujarati to English language pair only.
A. Literature Review about Machine Translation Systems:
As per literature review [1], we found four Machine
Translation Systems namely MANTRA, Google Translate,
ANUVADAKSH/EILMT [2] and AnglaBharati-II translate
text from English language to Gujarati language [3][10].
Google Translate is only Machine Translation system that
also translates text from Gujarati language to English
language [1]. Google Translate is a free and well known
web-based Machine Translation service, which is developed
by Google. It supports translation for more than 100
languages. We have found that for many examples it gives
erroneous result for translating text containing Gujarati
idioms.
B. Literature Review about context identification and
idiom translation:
Turney defined the context as widely used and ill-defined
term. Context is the circumstances that form the setting for
an event, statement, or idea, and in terms of which it can be
fully understood. Contextual features are useful only when
they are considered in combination with other features [4].
Sekiya et al. discussed that context for news can be specified
by two approaches Bag-of-Words (BOW) and relational
information [5].
Dhariya et al. proposed a hybrid approach for MTS using
Hindi-English pair. The system proposed by them works with
only four types of tenses. Example based methods are
proposed for sentences includes idioms. The machine
translation segment takes care of the sentences which have
the ambiguity. Various rules have been defined for resolving
conflicts between proper noun and dictionary word [6].
Mishra et al. proposed hybrid approach for Hindi to English
idiom translation by considering three cases of idioms. They
proposed three ways of Hindi idiom translation on the base
of similar meaning-similar form, similar meaning-dissimilar
form and different meaning-different form of idioms in both
languages [7].
Salton et al. discussed and evaluated the substitution based
idiom translation technique. They found that the method is
not the complete solution for the multiword expressions and
the system efficiency is dependent on idiom-fixedness only
[8].
2020 International Conference for Emerging Technology (INCET)
Belgaum, India. Jun 5-7, 2020
978-1-7281-6221-8/20/$31.00 ©2020 IEEE 1