Bridging Bangla to Universal Networking Language- A Human Language Neutral Meta-Language Md. Ershadul H. Choudhury, Nawab Yousuf Ali, Mohammad Zakir Hussain Sarkar, Md. Ahsan Razib Computer Science & Engineering Dept., East West University, Dhaka, Bangladesh ershad@ewubd.edu, nawab@ewubd.edu, zakir@ewubd.edu Abstract This paper presents the specification of Universal Networking Language (UNL), a project under the auspices of the United Nations University (UNU), Tokyo and a frame work of bridging Bangla language to UNL. The mission of the UNU project is to allow people across nations to access information in Internet in their own languages. The core of the project is UNL, a language independent specification for serving as a common medium for documents in different languages. Researchers involved in this project from different countries have been developing UNL system for their respective native languages. The process basically involves i) building native language to UNL dictionary and ii) deriving language specific syntactic rules called analysis rules for parsing and translating native language corpora to UNL. In this paper we present parallel works for developing a framework for bridging Bangla to UNL showing some procedures to construct Bangla to UNL dictionary and parsing and translating Bangla sentences to UNL. As to our knowledge this is a pioneering work in Bangla. Keywords: Bangla-UNL Dictionary, Morphological Analysis, Universal Networking Language, Universal Words, Hypergraph I. INTRODUCTION Although, there is an immense proliferation of information through Internet, it is not accessible to vast multitude of people across nations as most of the resources are in English. To overcome this problem, United Nations launched Universal Networking Language project [1] in 1996. The result of the project is universal networking language (UNL), a language neutral specification, and a universal parser specification [4]. The goal is to eliminate the massive task of translation between two languages and reduce language to language translation to a one time conversion to UNL. For example, Bangla corpora, once converted to UNL, can be translated to any other language given UNL system built for that language. The UNL system does this by representing only the semantics of a native language sentence in a hypergraph. Enconverter (parser) converts each native language sentence to a UNL hypergraph and de- converter translates from hypergraph to any native language. The hypergraph has formal English text realization as English is known to experts. The development of the language specific components - dictionary and analysis rules- is carried out by researchers across the world. The UNL project currently includes 16 official languages such as Arabic, Chinese, English, French [7], Russian, Hindi [6]. Bangla is not yet included. In this paper we present the UNL system for Bangla. The major components of our research works touches upon i) rules to construct Bangla-UNL dictionary in line with the principles of UNL and use of morphological analysis ii) development and use of analysis rules and iii) translation scheme (parsing). In section II we describe the UNL system. In sections III and IV, we present our main works that include all the above three components. II. UNIVERSAL NETWORKING LANGUAGE SPECIFICATION The UNL [1] has been introduced as a digital meta language for describing, summarizing, refining, storing and disseminating information in a machine- independent and human language neutral form. This meta-language focuses to express meanings in standardized way. We think that a comprehensive description of UNL specification is necessary though it is available in UNL website. The meaning of native language sentence is expressed in UNL system as a hypergraph composed of nodes connected by semantic relations. Nodes or Universal Words (UWs) are words loaned from English and disambiguated by their positioning in a knowledge base (KB) [1] of conceptual hierarchies. Function words, such as determiners and auxiliaries are represented as attributes to UWs or nodes to provide additional information. The core structure of UNL is based on the following elements: Universal Words: Nodes that represent word meaning Attribute Labels: Additional information about the universal words Relation Labels: Tags that represent the relationship between Universal Words i.e. between two nodes Tags are the arcs of UNL hypergraph. A. Universal Words Universal Words are words that constitute the vocabulary of UNL. A UW is not only a unit of the UNL syntactically and semantically for expressing a