Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 15-17 July, 2012
KOREAN-CHIN ESE STATISTICAL TRANSLATION MODEL
SHUO LI, DEREK F. WONG, LIDIA S. CHAO
Deprtment of Computer and Information Science, University of Macau, Macau SAR, China
E-MAIL: mb05486@umac.mo.derekw@umac.mo.lidiasc@umac.mo
Abstract:
Korean and Chinese belong to diferent language families
and there are very few researches on statistical machine
translation between them. The word order of these two
languages is quite diferent. Korean is considered as a
morphologically rich language when compared to Chinese.
Hence, in translating Korean into Chinese, more linguistic
knowledge is required to achieve a better translation result. This
paper presents a Korean to Chinese machine translation system
by incorporating diferent linguistic data of Korean into the
translation model. A state-of-the-art factored translation model
is employed to verify the goodness of the proposed approach,
which is eicient not only for the European languages, but also
for Korean and Chinese. Experimental results demonstrate the
solid evidence that the proposed method is able to achieve a
better performance by integrating diferent types of linguistic
information.
Keywords:
Statistical Machine Translation; Korean-Chinese; Factored
Translation Model
1. Introduction
In the recent decades, machine ranslation (MT) has
been rapidly developed and successully applied to the
ranslation of various domains. Among a variety of MT
approaches, statistical machine ranslation (SMT) becomes a
promising direction and receives the most attentions in the
community. SMT uses statistical techniques to ranslate a
source language into a target language, without relying on
hand crated rules of speciic languages. There are many
successul and valuable SMT researches regarding European
languages, but there are very few studies on Asian languages,
like Korean-Chinese. Korean and Chinese belong to the
diferent language families. Meanwhile, their mophologies
are also diferent rom European languages. Korean is a
typical kind of subject-object-verb (SOV) language while
Chinese is subject-verb-object (SVO) language. Consider the
following example:
Korean: .�(I) -lI !l(school) �q(went to).
Chinese:
t
(I)
*
(went to)
�t
(school) 0
978-14673-1487-9/12/$31.00 ©2012 IEEE
English: I went to school.
The word order of Chinese sentence is quite similar to
that of the English sentence, but not the Korean sentence. It is
much more diicult to accurately align the words of the
sentences, if the word order of a language pair is diferent.
Misalignments of the words pair (ranslation equivalences) in
two diferent languages may lead to a wrong ranslation result.
Moreover, in statistical translation approach, it is quite
diicult to deal with mophologically rich language [1]. In
Korean, a word may have diferent mophologies under
diferent conditions. Verb and adjective usually end with
suixes in a sentence to represent diferent meanings. As a
consequence, to translate Korean into Chinese, the analysis of
word mophology is important to the development of a MT
system, due to the variations of words. In the raditional
rule-based MT, this requires a large amont of grammatical
rules to ix the word ambiguities, and the whole process is
very time-consuming.
Several studies have been explored to tackle the
problems in Korean and Chinese SMT: most of them are
focus on using pre-processing and post-processing methods
such as reordering the source sentences of the Chinese
Korean [2]. Depending on diferent word orders of Korean
and Chinese, ransformation of the syntactic relations of
Chinese SVO pattens and insertion of the corresponding
ransferred relations as pseudo words [1] re considered to
enrich the Chinese sentences in a Chinese-Korean SMT
system. In accordance with adding richer information to SMT
models, a factored translation model was proposed by Koehn
and Hoang [3]. They use additional grammatical information,
such as Part-of-Speech (POS), to rain the ranslation model
using Moses [4], which is an open source SMT system.
However, there is not much work regarding the statistical
translation for Korean to Chinese.
In this paper, we propose a Korean to Chinese machine
translation system based on statistical approach. Due to the
lack of Korean-Chinese parallel documents and less research
resources on Korean to Chinese SMT, we have to build a
parallel copus ourselves by automatically acquiring the
content rom the Intenet. This article is sructured as follows.
767