Hybrid Machine Aided Translation System based
on Constraint Synchronous Grammar and
Translation Corresponding Tree
Fai Wong, Francisco Oliveira, Yiping Li
Faculty of Science and Technology, University of Macau, Macao
Email: {derekfw, olifran, ypli}@umac.mo
Abstract—As the demand of translating large volume of
material between Portuguese and Chinese is increasing
rapidly nowadays, especially in the city of Macau, the
translation work becomes impractical without the support
of effective tools. In order to fulfill this gap and build up a
translation workbench environment for translators, a
Machine Aided Translation System between these languages,
PCTAssist, is introduced. It is a hybrid system that applies
not only Translation Memory technology but also Machine
Translation methodologies, including the annotation schema
of Translation Corresponding Tree (TCT) in the
representation of bilingual examples, and the language
formalism Constraint Synchronous Grammar (CSG) in
analyzing the syntactic structure between the languages to
accomplish the translation task.
Index Terms—Machine Translation, Constraint
Synchronous Grammar, Translation Corresponding Tree
I. INTRODUCTION
The advancement of computer technologies has made
many changes in the daily life. As more documents have
to be translated daily, human translation becomes
impractical without the help of computer tools. These
include the use of electronic dictionaries, terminology
corpora, translation memory, and automatic translation.
They are often combined together as a whole in order to
improve translator’s daily work, which is classified as
Automatic Machine Translation (MT) systems and
Machine Aided or Computer Assisted Translation (CAT)
systems. Automatic MT systems generate the translation
based on the information in the Knowledge Base without
human intervention. On the other hand, CAT systems
first produce a preliminary translation result, and based
on the quality of the translation, translators make
necessary changes afterwards.
There are a huge number of systems available in the
market nowadays. They differ in the supported file
formats, languages, operating systems, functions
provided, and price. A list of these systems can be found
in [1]. Moreover, different designs to MT have been
proposed in the literature. Rule based MT [2] approach is
based on a set of linguistic grammar rules for handling
the translation, which can be categorized as Direct,
Transfer based, or Interlingua based approaches. They
differ in the definition of the linguistic context, the
knowledge used, and the number of stages needed for
translating a sentence. Direct approaches only handle
word by word translation, and they ignore all the
syntactic and semantic information. There are three
modules in Transfer based approach: analyzer module
analyzes the source text and converts it into an
intermediary representation; transfer module maps the
representation into a target language structure based on a
set of conversion rules; generation module synthesizes
the transferred representation into the corresponding
target language. In Interlingua approach, the transfer
module is not considered. Example based MT [3][4]
analyzes different pieces of bilingual examples stored in
parallel corpora for generating the translation. However,
it often depends on the quality of the examples and the
similarity function applied. As there are more digitized
resources available nowadays, Statistic based approaches
[5][6] become a new research trend. These approaches
take into consideration of probabilities estimated between
the translation of words and the ordering of the sentences
extracted from the corpora. The accuracy is often highly
dependent with the information of the digitized resources.
Each of these approaches has its strength and weakness
in application to the development of MT alone. The
combination of these methods leads to a hybrid system in
order to avoid the intrinsic impediments of different
translation methods [7][8].
Although it is so easy to get information and look for
these tools, there isn’t any practical and commercial
Machine Aided Translation System especially developed
for Portuguese and Chinese languages. In particular, the
use of these two languages plays an important role in the
city of Macau, which is considered as official languages.
In this paper, a hybrid Machine Aided Portuguese
Chinese system, PCTAssist, is presented. The system is
targeted for Portuguese and Chinese, and provides a
helpful translation tool for translators who need to work
with these languages. Moreover, the system is designed
to integrate the advantages of Rule based and Example
based approaches and to get rid of their disadvantages.
Since Portuguese and Chinese come from different
language families, they are very different in terms of
writing and grammar. Table 1 shows some common non-
standard linguistic relationships between them.
JOURNAL OF COMPUTERS, VOL. 7, NO. 2, FEBRUARY 2012 309
© 2012 ACADEMY PUBLISHER
doi:10.4304/jcp.7.2.309-316