Proceedings of the International Conference on Computer and Communication Engineering 2008 May 13-15, 2008 Kuala Lumpur, Malaysia 978-1-4244-1692-9/08/$25.00 ©2008 IEEE Specific Features of a Converter of Web Documents from Bengali to Universal Networking Language Md. Nawab Yousuf Ali 1 , Jugal Krishna Das 2 , S. M. Abdullah Al-Mamun 3 , Md Ershadul H. Choudhury 4 1Department of CSE, East West University, Dhaka, Bangladesh 2 Department of CSE, Jahangirnagar University, Dhaka, Bangladesh 3 Department of CSE, Ahsanullah University of Science and Technology,Dhaka, Bangladesh 4 EEE Department, American International University-Bangladesh, Dhaka, Bangladesh Email (nawab@ewubd.edu) Abstract In this paper, we present a workable structure along with characteristic features of a subsystem that may become an integral part of a Language Server bridging Bengali and the Universal Networking Language (UNL). We try to assimilate the results of the research efforts of the UNL community and also of various machine translation projects. Vast information resources in different languages are available in the Internet, but the can not be shared (because of vastly due to the language barrier). And the UNL community is set to devise an effective and efficient system to diminish that barrier with an ultimate aim to allow automatic conversion of web based resources in one member language to that in another member language. A good number of researchers in computational linguistics all over the world have already joined hands with the UNL initiators, and research groups representing most widely used natural languages are working intensively for the purpose. This paper is to demonstrate our pioneering efforts in the field of Bengali (Bangla). Here we here outline a possible Bangla-UNL dictionary, feature an annotation editor for Bangla texts, infer significant morphological, syntactic and semantic rules for parsing Bangla web documents in connection with conversion to the UNL, and show possible ways of future contribution towards the goal. Keywords: Universal Networking Language (UNL), Universal Words (UW), Bangla-UNL Dictionary, Morphological Analysis, Hyper graph, Enconverter, Deconverter. I. INTRODUCTION Nations are becoming more interdependent and need to exchange information, language barrier hinders progress at individual level, institutionally and nationally. Translation is the only means to disseminate information only with much effort involving direct and indirect cost. That is why, vast information resources in different languages could not be shared. Knowledge and information scattered all over the world remain mostly inaccessible due to non- machine representation and language barrier. The United Nations University/Institute of Advanced Studies (UNU/IAS) reviewed all internationally available machine translation (MT) and devised a better, more efficient, effective and more suitable technique to develop a human language neutral meta language for Internet. The result of this project is UNL (Universal Networking Language), launched in 1996[2]. This is an artificial language in the form of semantic network for computers to express and exchange every kind of information. At the advent of computers, researchers around the world have worked towards a system that would overcome language barriers. The goal of this system is to eliminate the massive task of translation between two languages and reduce (translation from one language to another) to a one to another time conversion to UNL. For example, Bangla corpora, once converted to UNL can be translated to any other language given UNL system built as the language as shown in Figure 1. The UNL system does this by representing only the semantics of a native language sentence in a hypergraph having Universal Words (UWs) as nodes and relations as arcs. This hypergraph is also represented as a set of directed binary relations, each 726