Hybrid Machine Aided Translation System based on Constraint Synchronous Grammar and Translation Corresponding Tree Fai Wong, Francisco Oliveira, Yiping Li Faculty of Science and Technology, University of Macau, Macao Email: {derekfw, olifran, ypli}@umac.mo Abstract—As the demand of translating large volume of material between Portuguese and Chinese is increasing rapidly nowadays, especially in the city of Macau, the translation work becomes impractical without the support of effective tools. In order to fulfill this gap and build up a translation workbench environment for translators, a Machine Aided Translation System between these languages, PCTAssist, is introduced. It is a hybrid system that applies not only Translation Memory technology but also Machine Translation methodologies, including the annotation schema of Translation Corresponding Tree (TCT) in the representation of bilingual examples, and the language formalism Constraint Synchronous Grammar (CSG) in analyzing the syntactic structure between the languages to accomplish the translation task. Index Terms—Machine Translation, Constraint Synchronous Grammar, Translation Corresponding Tree I. INTRODUCTION The advancement of computer technologies has made many changes in the daily life. As more documents have to be translated daily, human translation becomes impractical without the help of computer tools. These include the use of electronic dictionaries, terminology corpora, translation memory, and automatic translation. They are often combined together as a whole in order to improve translator’s daily work, which is classified as Automatic Machine Translation (MT) systems and Machine Aided or Computer Assisted Translation (CAT) systems. Automatic MT systems generate the translation based on the information in the Knowledge Base without human intervention. On the other hand, CAT systems first produce a preliminary translation result, and based on the quality of the translation, translators make necessary changes afterwards. There are a huge number of systems available in the market nowadays. They differ in the supported file formats, languages, operating systems, functions provided, and price. A list of these systems can be found in [1]. Moreover, different designs to MT have been proposed in the literature. Rule based MT [2] approach is based on a set of linguistic grammar rules for handling the translation, which can be categorized as Direct, Transfer based, or Interlingua based approaches. They differ in the definition of the linguistic context, the knowledge used, and the number of stages needed for translating a sentence. Direct approaches only handle word by word translation, and they ignore all the syntactic and semantic information. There are three modules in Transfer based approach: analyzer module analyzes the source text and converts it into an intermediary representation; transfer module maps the representation into a target language structure based on a set of conversion rules; generation module synthesizes the transferred representation into the corresponding target language. In Interlingua approach, the transfer module is not considered. Example based MT [3][4] analyzes different pieces of bilingual examples stored in parallel corpora for generating the translation. However, it often depends on the quality of the examples and the similarity function applied. As there are more digitized resources available nowadays, Statistic based approaches [5][6] become a new research trend. These approaches take into consideration of probabilities estimated between the translation of words and the ordering of the sentences extracted from the corpora. The accuracy is often highly dependent with the information of the digitized resources. Each of these approaches has its strength and weakness in application to the development of MT alone. The combination of these methods leads to a hybrid system in order to avoid the intrinsic impediments of different translation methods [7][8]. Although it is so easy to get information and look for these tools, there isn’t any practical and commercial Machine Aided Translation System especially developed for Portuguese and Chinese languages. In particular, the use of these two languages plays an important role in the city of Macau, which is considered as official languages. In this paper, a hybrid Machine Aided Portuguese Chinese system, PCTAssist, is presented. The system is targeted for Portuguese and Chinese, and provides a helpful translation tool for translators who need to work with these languages. Moreover, the system is designed to integrate the advantages of Rule based and Example based approaches and to get rid of their disadvantages. Since Portuguese and Chinese come from different language families, they are very different in terms of writing and grammar. Table 1 shows some common non- standard linguistic relationships between them. JOURNAL OF COMPUTERS, VOL. 7, NO. 2, FEBRUARY 2012 309 © 2012 ACADEMY PUBLISHER doi:10.4304/jcp.7.2.309-316