Dr. Amir Reza Shahbazkia, International Journal of Computer Science and Mobile Computing, Vol.8 Issue.4, April- 2019, pg. 270-311
© 2019, IJCSMC All Rights Reserved 270
Available Online at www.ijcsmc.com
International Journal of Computer Science and Mobile Computing
A Monthly Journal of Computer Science and Information Technology
ISSN 2320–088X
IMPACT FACTOR: 6.199
IJCSMC, Vol. 8, Issue. 4, April 2019, pg.270 – 311
Machine Translation by Homograph
Detector with the Help of
Grammatical Base of Persian Words
1
Dr. Amir Reza Shahbazkia
1
1105AmirReza@gmail.com
Abstract: Language is core medium of communication and
translation is core tool for the understand the information in
unknown language. Machine translation helps the people to
understand the information of unknown language without the
help of Human translator. This study is brief introduction to
machine Translation and the solution for homographs.
machine translation have been developed for many popular
languages and many researches and developments have been
applied to those languages but a significant problem in
Persian (the language of Iranian, Afghani, etc.) is detecting
the homographs which is not generally problematic in any
other languages except Arabic. Detection of homographs in
Arabic have been extensively studied. However Persian and
Arabic share 28 characters, having only 4 different characters,
they are two quite different languages. Homographs, words
with same spelling and different translations are more
problematic to detect in Persian because not all the
pronounced vowels are written in the text (only 20% of vowels
are written in the text) so the number of homographs in
Persian is about thousands of times more than in other
languages except Arabic.
In this paper we propose a new method for analysis and
finding exact translation for homographs by algorithmic and
grammatical rules.
Keywords: homograph disambiguation, machine translation,
Statistical, homograph disambiguation
1. Introduction
significant problem in Persian (or Farsi) machine translation is
homograph detection and disambiguation. This is not generally
problematic in any other language except Arabic. Although a
large work has been done for Arabic homograph detection and
disambiguation with MADA [9], this work is useless for Persian.
In fact Persian and Arabic are two quite different languages
although they share 28 characters and have only 4 different ones.
Since not all the vowels pronounced are actually written in the
Persian and Arabic text, these two languages share a common
problem in homograph detection and disambiguation but with
different solutions.
Moreover the number of homographs in Persian is about
thousands of times more than in other languages, except Arabic.
In Persian there are 32 characters from which 29 characters
are consonants and the rest are vowels as shown below:
بPronounced as b
پPronounced as p
ت طPronounced as t
ث س صPronounced as s
جPronounced as j
چPronounced as ch=C
ح هPronounced as h
خPronounced as kh=x
A