UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn National Electronics and Computer Technology Center, National Science and Technology Development Agency, Ministry of Science Technology and Environment, 22 nd Floor Gypsum Metropolitan Tower 539/2 Sriayudhya Rd. Rajthevi Bangkok 10400 Thailand. Email: {virach, tanapong, thatsanee}@nectec.or.th Abstract This paper proposes an approach on UNL docu- ment summarization. Our approach employs both the surface and semantic information of UNL an- notation to summarize documents. With the merit of semantic annotation of the UNL, the essence of the document is efficiently collected which facili- tates the abstraction function for language genera- tion. The multilinguality can also be realized through the language decoverters from the summa- rized UNL document to the target languages under the UNL framework. The experiment result shows the improvement of the summarization quality in using the UNL annotation comparing with the original plain text. Introduction The UNL project ([8]) has been proposed under the aegis of the United Nations University, Japan since 1996. The UNL project is a collaborative work of research institutions from 16 countries. UNL aims to be an international semantic annotation standard for network oriented multilingual communication. The UNL framework provides a mean for repre- senting the meaning of natural language document with a set semantic graphs. This paper introduces a summarization method to UNL document for a better summarization result. Rather than employing only the superficial information, we directly proc- ess the UNL semantic information to extract the essence of the document. Our work shows the im- provement of the summarization quality in using the UNL annotation. 1 UNL specification The existing interlingua-based machine translation systems translate source languages to an interlin- gua and then translate the interlingua to the target language. The errors in creating the interlingua propagate to the target language generation. This drawback in the interlingual approach has impeded the progress in practical use. To improve the translation accuracy, the UNL project proposes a new paradigm in which the users directly prepare the interlingual documents called UNL as the source documents. So that the source language for the target language generation is the flawless in- terlingua. Supporting the UNL framework, the UNL documents are designed to contain no se- mantic ambiguities. UNL is a project for multilingual net- working communication initiated by the United Nations University, Japan. UNL bases on an inter- lingual approach represented by a hypergraph. A UNL graph consists of nodes and links. A node is formed by a universal word (UW) attaching with a list of attributes (such as @entry indicating the entry node of the UNL graph; @pl indicating the plurality of the concept; @def indicating the definiteness of the concept). A link is a directed arc labeled by a semantic relation between the corre- sponding two nodes. A UNL document is a text encoding a set of UNL graphs. More details on UNL can be found in [1], [4], [5] and [8]. Figure 1 and 2 show an example of a UNL graph and UNL text. 2 Universal words A UW denotes an interlingual acceptation used for concept representation in UNL. Theoretically, a UW has only one meaning. In other words, UWs do not allow semantic ambiguity. The reasons why English words are employed in UW construction are that (i) English is known by all UNL develop- ers, and (ii) there are a lot of good bi-lingual dic- tionaries between a local language and English available. ([5]) The expression of UW is: “<headword> (<list of restrictions>)” e.g. book(icl>do,obj> room). Restrictions are the composition of the fol- lowing constraints: 1) Icl (stands for inclusion) is the restriction de- fining the semantic class where the UW is in- cluded. A part of UNL class hierarchy is shown in Figure 3. For example, “car(icl> movable thing)” indicates that this UW is in the class of movable thing. 2) Any semantic relations, available for the UNL arcs, with a UNL class name can be used in re- stricting the meaning of the English headword.