Procedia
Technology
Procedia Technology 00 (2011) 000–000
www.elsevier.com/locate/procedia
2
nd
International Conference on Communication, Computing & Security
A framework for translating English text into Malayalam
using statistical models
Mary Priya Sebastian
a
, Sheena Kurian K
b
, G. Santhosh Kumar
a,b,
a*
a
Asst. Professor, Dept. of Computer Science, Rajagiri School of Engg. & Technology, Kochi-682039,Kerala, India
b
Asst. Professor, Dept. of Computer Science, KMEA College of Engg. & Technology, Kochi-682039,Kerala, India
a,b
Professor, Dept. of Computer Science, Cochin Unniversity of Science and Technology, Kochi-682039,Kerala, India
Abstract
A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is
discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam
corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence.
Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual
corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured
better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word
elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in
translation process that arise due to the structural difference between the English Malayalam pair is resolved in the
decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation
process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented
in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and
the results are verified with F measure, BLEU and WER evaluation metrics.
© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [name organizer]
Keywords: Alignment; English Malayalam Translation; PoS Tagging; Statistical Machine Translation; Suffix Separation
* Mary Priya Sebatsian. Tel.: +91-484-2427835; fax: +91-484-2426241
E-mail address: marypriya_s@rajagiritech.ac.in