Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 July, 2012 KOREAN-CHIN ESE STATISTICAL TRANSLATION MODEL SHUO LI, DEREK F. WONG, LIDIA S. CHAO Deprtment of Computer and Information Science, University of Macau, Macau SAR, China E-MAIL: mb05486@umac.mo.derekw@umac.mo.lidiasc@umac.mo Abstract: Korean and Chinese belong to diferent language families and there are very few researches on statistical machine translation between them. The word order of these two languages is quite diferent. Korean is considered as a morphologically rich language when compared to Chinese. Hence, in translating Korean into Chinese, more linguistic knowledge is required to achieve a better translation result. This paper presents a Korean to Chinese machine translation system by incorporating diferent linguistic data of Korean into the translation model. A state-of-the-art factored translation model is employed to verify the goodness of the proposed approach, which is eicient not only for the European languages, but also for Korean and Chinese. Experimental results demonstrate the solid evidence that the proposed method is able to achieve a better performance by integrating diferent types of linguistic information. Keywords: Statistical Machine Translation; Korean-Chinese; Factored Translation Model 1. Introduction In the recent decades, machine ranslation (MT) has been rapidly developed and successully applied to the ranslation of various domains. Among a variety of MT approaches, statistical machine ranslation (SMT) becomes a promising direction and receives the most attentions in the community. SMT uses statistical techniques to ranslate a source language into a target language, without relying on hand crated rules of speciic languages. There are many successul and valuable SMT researches regarding European languages, but there are very few studies on Asian languages, like Korean-Chinese. Korean and Chinese belong to the diferent language families. Meanwhile, their mophologies are also diferent rom European languages. Korean is a typical kind of subject-object-verb (SOV) language while Chinese is subject-verb-object (SVO) language. Consider the following example: Korean: .�(I) -lI !l(school) �q(went to). Chinese: t (I) * (went to) �t (school) 0 978-14673-1487-9/12/$31.00 ©2012 IEEE English: I went to school. The word order of Chinese sentence is quite similar to that of the English sentence, but not the Korean sentence. It is much more diicult to accurately align the words of the sentences, if the word order of a language pair is diferent. Misalignments of the words pair (ranslation equivalences) in two diferent languages may lead to a wrong ranslation result. Moreover, in statistical translation approach, it is quite diicult to deal with mophologically rich language [1]. In Korean, a word may have diferent mophologies under diferent conditions. Verb and adjective usually end with suixes in a sentence to represent diferent meanings. As a consequence, to translate Korean into Chinese, the analysis of word mophology is important to the development of a MT system, due to the variations of words. In the raditional rule-based MT, this requires a large amont of grammatical rules to ix the word ambiguities, and the whole process is very time-consuming. Several studies have been explored to tackle the problems in Korean and Chinese SMT: most of them are focus on using pre-processing and post-processing methods such as reordering the source sentences of the Chinese Korean [2]. Depending on diferent word orders of Korean and Chinese, ransformation of the syntactic relations of Chinese SVO pattens and insertion of the corresponding ransferred relations as pseudo words [1] re considered to enrich the Chinese sentences in a Chinese-Korean SMT system. In accordance with adding richer information to SMT models, a factored translation model was proposed by Koehn and Hoang [3]. They use additional grammatical information, such as Part-of-Speech (POS), to rain the ranslation model using Moses [4], which is an open source SMT system. However, there is not much work regarding the statistical translation for Korean to Chinese. In this paper, we propose a Korean to Chinese machine translation system based on statistical approach. Due to the lack of Korean-Chinese parallel documents and less research resources on Korean to Chinese SMT, we have to build a parallel copus ourselves by automatically acquiring the content rom the Intenet. This article is sructured as follows. 767