Vol 8. No. 2 June, 2015
African Journal of Computing & ICT
© 2015 Afr J Comp & ICT – All Rights Reserved - ISSN 2006-1781
www.ajocict.net
97
A New Approach: Automatically Identify Proper Noun from Bengali Sentence
for Universal Networking language
M. S. Islam & J.K. Das
Computer Science and Engineering
Jahangirnagar University
Savar, Dhaka-1213, Bangladesh
Phone: +8801916574623, email: syefulislam@yahoo.com
Phone: +8801712509082, email: drdas64@yahoo.com
ABSTRACT
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with
each other for different using various languages. Machine translation is highly demanding due to increasing the usage of web
based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is
relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small
or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding
whether a word is a proper noun or not. Here we have introduce a new approach to identify proper noun from a Bengali sentence
for UNL without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion
to UNL and vice versa with minimal storing word in dictionary.
Keywords- UNL; Rule based analysis, Morphological Analysis; Post Converted; Head word; Knowledge base.
African Journal of Computing & ICT Reference Format:
M. S. Islam & J.K. Das (2015). A New Approach: Automatically Identify Proper Noun from Bengali Sentence for Universal Networking
Language. Afr J. of Comp & ICTs. Vol 8, No. 2. Pp 97-106.
I. INTRODUCTION
Today the demand of inter communication between all levels
of peoples in various country is highly increased. This
globalization trend evokes for a homogeneous platform so
that each member of the platform can apprehend what other
intimates and perpetuates the discussion in a mellifluous way.
However the barriers of languages throughout the world are
continuously obviating the whole world from congregating
into a single domain of sharing knowledge and information.
Therefore researcher works on various languages and tries to
give a platform where multi lingual people can communicate
through their native language. Researcher analyze the
language structure and form structural grammar and rules
which used to translate one language to other. From the very
beginning the Indian linguist Panini proposed vyaakaran (a
set of rules by which the language is analyzed) and gives the
structure for Sanskrit language. After the era of Panini
various linguist works on language and proposed various
technique.
But the most modern theory proposed by the American
linguist Noam Chomsky is universal grammar which is the
base of modern language translation program. From the last
few years several language-specific translation systems have
been proposed. Since these systems are based on specific
source and target languages, these have their own limitations.
As a consequence United Nations University/Institute of
Advanced Studies (UNU/IAS) were decided to develop an
inter-language translation program [1].
The corollary of their continuous research leads a common
form of languages known as Universal Networking Language
(UNL) and introduces UNL system. UNL system is an
initiative to overcome the problem of language pairs in
automated translation. UNL is an artificial language that is
based on Interlingua approach. UNL acts as an intermediate
form computer semantic language whereby any text written
in a particular language is converted to text of any other
forms of languages [2]-[3].
UNL system consists of major three components: language
resources, software for processing language resources
(parser) and supporting tools for maintaining and operating
language processing software or developing language
resources. The parser of UNL system take input sentence and
start parsing based on rules and convert it into corresponding
universal word from word dictionary. The challenge in
detection of named is that such expressions are hard to
analyze using UNL because they belong to the open class of
expressions, i.e., there is an infinite variety and new
expressions are constantly being invented. Bengali is the
seventh popular language in the world, second in India and
the national language of Bangladesh. So this is an important
problem since search queries on UNL dictionary for proper
nouns while all proper nouns (names) cannot be
exhaustively maintained in the dictionary for automatic
identification.