Proceedings of the International Conference on Computer and Communication Engineering 2008 May 13-15, 2008 Kuala Lumpur, Malaysia
978-1-4244-1692-9/08/$25.00 ©2008 IEEE
Specific Features of a Converter of Web Documents from Bengali to
Universal Networking Language
Md. Nawab Yousuf Ali
1
, Jugal Krishna Das
2
, S. M. Abdullah Al-Mamun
3
,
Md Ershadul H. Choudhury
4
1Department of CSE, East West University, Dhaka, Bangladesh
2
Department of CSE, Jahangirnagar University, Dhaka, Bangladesh
3
Department of CSE, Ahsanullah University of Science and Technology,Dhaka, Bangladesh
4
EEE Department, American International University-Bangladesh, Dhaka, Bangladesh
Email (nawab@ewubd.edu)
Abstract
In this paper, we present a workable structure
along with characteristic features of a subsystem that
may become an integral part of a Language Server
bridging Bengali and the Universal Networking
Language (UNL). We try to assimilate the results of
the research efforts of the UNL community and also of
various machine translation projects. Vast information
resources in different languages are available in the
Internet, but the can not be shared (because of vastly
due to the language barrier). And the UNL community
is set to devise an effective and efficient system to
diminish that barrier with an ultimate aim to allow
automatic conversion of web based resources in one
member language to that in another member language.
A good number of researchers in computational
linguistics all over the world have already joined
hands with the UNL initiators, and research groups
representing most widely used natural languages are
working intensively for the purpose. This paper is to
demonstrate our pioneering efforts in the field of
Bengali (Bangla). Here we here outline a possible
Bangla-UNL dictionary, feature an annotation editor
for Bangla texts, infer significant morphological,
syntactic and semantic rules for parsing Bangla web
documents in connection with conversion to the UNL,
and show possible ways of future contribution towards
the goal.
Keywords: Universal Networking Language
(UNL), Universal Words (UW), Bangla-UNL
Dictionary, Morphological Analysis, Hyper graph,
Enconverter, Deconverter.
I. INTRODUCTION
Nations are becoming more interdependent and
need to exchange information, language barrier hinders
progress at individual level, institutionally and
nationally. Translation is the only means to
disseminate information only with much effort
involving direct and indirect cost. That is why, vast
information resources in different languages could not
be shared. Knowledge and information scattered all
over the world remain mostly inaccessible due to non-
machine representation and language barrier. The
United Nations University/Institute of Advanced
Studies (UNU/IAS) reviewed all internationally
available machine translation (MT) and devised a
better, more efficient, effective and more suitable
technique to develop a human language neutral meta
language for Internet. The result of this project is UNL
(Universal Networking Language), launched in
1996[2]. This is an artificial language in the form of
semantic network for computers to express and
exchange every kind of information. At the advent of
computers, researchers around the world have worked
towards a system that would overcome language
barriers. The goal of this system is to eliminate the
massive task of translation between two languages and
reduce (translation from one language to another) to a
one to another time conversion to UNL. For example,
Bangla corpora, once converted to UNL can be
translated to any other language given UNL system
built as the language as shown in Figure 1.
The UNL system does this by representing only the
semantics of a native language sentence in a
hypergraph having Universal Words (UWs) as nodes
and relations as arcs. This hypergraph is also
represented as a set of directed binary relations, each
726