International Journal on Natural Language Computing (IJNLC) Vol.11, No.1, February 2022 DOI: 10.5121/ijnlc.2022.11104 47 A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION Redwan Ahmed Rizvee, Asif Mahmood, Shakur Shams Mullick and Sajjadul Hakim Tiger IT Bangladesh Limited, Dhaka, Bangladesh ABSTRACT Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool. KEYWORDS Transliteration framework, phonetic typing, English to Bangla, hybrid framework, THT 1. INTRODUCTION In this era of globalization, people are unprecedentedly exposed to information from global sources. Especially with the advent of the internet and smartphones, access to information in a foreign language has become increasingly common. Machine Translation (MT) can play a crucial role in this aspect [1] as it assists information exchange by converting foreign language texts intoa person’s native language. One particular challenge of MT is to named entities (NE), where transliteration is often preferable. Transliteration refers to the phonetic conversion of words across different pairs of languages [2]– [4]. However, it is a challenging task since pronunciation rules vary across languages and there are times when exact/similar sounding phonemes are not available in the target language. An example of this would be to transliterate the proper noun Parvez/Parves (a Persian name) in Arabic which has no letter in the alphabet that sounds similar to both ‘P’ and ‘V’ sounds. Another challenge is to transliterate phonetic typed text where native language words are written using primarily English alphabet. Due to the widespread use of social media and internet-based chat applications, encountering various mixture of English and phonetic typed text has become frequent. Translating such texts adds to the challenge since such phonetic typed words need to be reverse-transliterated. This paper focuses on the transliteration of words from English to Bangla. Bangla (also known as Bengali) is the 6th largest language with over 268 million users 1 . Despite that there is no advanced MT or transliteration tools available that addresses the aforementioned challenges satisfactorily. The most known literature and system which have addressed Bangla transliteration problem are [5]–[7]. Among them, Avro [7] is an open source implementation which also is one of the most