3 rd National Natural Language Processing Symposium - Building Language Tools and Resources 11 LFG-Based English-Filipino Translator Erwin Andrew O. Chan, Chris Ian R. Lim , Richard Bryan S. Tan, Marlon Cromwell N. Tong, Allan B. Borra College of Computer Studies De La Salle University 2401 Taft Avenue 1004 Manila, Philippines (632) 524-0402 {chris.lim, erwin.chan, richard.tan.a, marlon.tong, borraa} @dlsu.edu.ph ABSTRACT This paper discusses the proposed architecture for a bidirectional machine translation system using Lexical Functional Grammar (LFG). The LFG-based English-Filipino Translator (L.E.F.T.) allows users to translate documents written in English to Filipino and vice-versa. It uses rule-based methods which parses a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. This method requires extensive lexicons with morphologic, syntactic, and semantic information, and large sets of rules. The LFG is a technique in Natural Language Processing that aims to make a representation that can handle a language’s syntactic, lexical, morphological, and semantic information. By using LFG and a very modular approach in creating the architecture, the system should be able to harness the descriptive capabilities of LFG and at the same time be flexible enough to handle changes. Keywords Rule Based, Lexical Functional Grammar, Machine Translation, National Language Processing, Transfer Rules I. INTRODUCTION Machine Translation has always aimed to improve the quality of translations produced. This quality is usually dependent on the algorithms and the resources available. The resources such as electronic lexicons, transfer rules, and formal grammar are very limited for the Filipino language. Because of this, there is a need to create an extensible architecture that will be able to handle changes in resources. Likewise, since the Filipino language is very diverse, existing machine translation algorithms might add more complication when used in English-Filipino translation. LFG promises the descriptive capabilities of transformation based systems without the need for complex implementations. It relies on the context free backbone, which generally reduces redundancy in its rules. Therefore, using LFG would be able to reduce complexity in the system. [5] In addition, giving less complexity to the system would make it more extensible. One problem past systems had was that adding additional languages to what it can handle would be a tedious work because the system is already too complex. The tendency is that the quality of the system decreases as more information is added into it. With this, a system that uses LFG would not only be easier to implement, but also easy to extend when the need arises. Also, although there have been numerous research on Machine Translation, few have been directly focused on the Filipino language. By studying previous works such as the Filipino Syntax-Semantics Analyzer (FiSSAn) [1] and Translation with Rule Learning (TwiRL) [8], the system aims to further improve the current research work on the Filipino language. II. RELATED WORKS 1. Filipino Syntax-Semantics Analyzer (FiSSAn) FiSSAn is a program that accepts Filipino sentences as input and outputs a graphical representation of the analyzed sentences in the form of attribute-value matrices. Each sentence will be scanned and analyzed to obtain a surface level representation, which will in turn be used to get the semantic-level representation or the f-structure. [1] In the architecture of the FiSSAn system, the sentence is analyzed by obtaining the dictionary and grammar from separate databases. These are then used to examine the sentence in order to produce the proper C-structure, and then F-structure. One problem with FiSSAn, however, is that its dictionary and grammar is not that rich. Further improvement in these aspects needs to be done in order for it to be more complete and be able to analyze the entire Filipino language [1]. The FiSSAn research has established a formal grammar of declarative sentences of the Filipino language which considers the free-word order phenomenon. Moreover, a lexicon architecture was established that includes semantic information for general grammatical categories such as nouns, verbs, adverbs and adjectives. 2. Lexical-Functional Transfer (LFT) LFT is a transfer framework for a machine translation system based on LFG. This framework is for specifying transfer rules with LFG schemata, which incorporates corresponding lexical functions of two different languages into an equational representation. The transfer process, therefore, is to solve equations called target f-descriptions derived from the transfer rules applied to the source f-structure and then to produce a target f-structure. [6]