17 Journal of Research in Science, Computing and Engineering (JRSCE) Extracting and Using Translation Templates in an Example-Based Machine Translation System Ethel Ong 1 Kathleen Go 1 Manimin Morga 1 Vince Nunez 1 Francis Veto 1 1 onge@dlsu.edu.ph College of Computer Studies De La Salle University – Manila Keywords: Template Learning, Example-Based Machine Translation, Bilingual Corpora A bidirectional English-Filipino machine translation system is developed that extracts translation templates and chunks from a given bilingual English-Filipino corpus. These templates and chunks are then used to translate an input English document to Filipino and vise versa. The system extended the similarity and difference translation template learning algorithms of Cicekli and Guvenir (2003) by refining existing templates and deriving templates from previously learned chunks. Chunk alignment, splitting algorithms, and chunk refinement are also introduced in the training process. Correct extraction of similarity templates and chunks during the learning process led to translation with a low word error rate of 15% for a test document whose sentences match exactly the training set, to a high 86% when the test document is different from the training corpus. Using difference templates alone, the resulting translation has a word error rate of 49% to 85%. Combined use of similarity and difference templates resulted in a low word error rate of 18% when the test document contains sentence patterns matching the training set, to a high 85% when the test document is different from the training corpus. Tests also showed that the translation with the highest score selected from a set of candidate translations is consistently the best choice when validated against automatic evaluation methods. ____ 1.0 INTRODUCTION Computer-based machine translation systems provide a means for people to exchange information more efficiently by allowing individuals to translate a body of text from one language to another. Translation between natural languages has been a major area of study in natural language processing. Various machine translation systems have been constructed that use two primary approaches, namely, the rule-based machine translation (RBMT) which makes use of rules of the language, and the example-based machine translation (EBMT) which relies on the information fed into the system, most of which are taken from aligned bilingual corpora. RBMT systems parse documents and use a symbolic representation to generate the translation output. They require a large set of rules and extensive lexicons with morphologic, syntactic, and semantic information to perform their task. Furthermore, these systems generate more and more rules which often lead to more complexity and sophistication, leading them to be quite difficult to maintain (Hovy et al, 2001). EBMT systems learn how a certain sentence is to be translated by being trained on a given bilingual corpus, which contains a set of sentences in the source language with a corresponding translation in the target language. Correspondences between the sentences are learned by the system and subsequently used in translation. Translations are generated by comparing the input from a source language to the examples or templates stored in a database, finding the best match possible before deducing the equivalent text in the target language.