Text-Translation Alignment:
Three Languages Are Better Than Two *
Michel Simard
Laboratoire de recherche appliqu4e en linguistique informatique (RALI)
Universit4 de Montr6al
S imardM©IRO. UMontreal. CA
Abstract
In this article, we show how a bilingual text-
translation alignment method can be adapted
to deal with more than two versions of a text.
Experiments on a trilingual corpus demonstrate
that this method yields better bilingual align-
ments than can be obtained with bilingual text-
alignment methods. Moreover, for a given num-
ber of texts, the computational complexity of
the multilingual method is the same as for bilin-
gual alignment.
Introduction
While bilingual text corpora have been part of
the computational linguistics scene for over ten
years now, we have recently witnessed the ap-
pearance of text corpora containing versions of
texts in three or more languages, such as those
developed within the CRATER (McEnery et
al., 1997), MULTEXT (Ide and V4ronis, 1994)
and MULTEXT-EAST (Erjavec and Ide, 1998)
projects. Access to this type of corpora raises
a number of questions: Do they make new ap-
plications possible? Can methods developed for
handling bilingual texts be applied to multilin-
gual texts? More generally: is there anything to
gain in viewing multilingual documents as more
than just multiple pairs of translations?
Bilingual alignments have so far shown that
they can play multiple roles in a wide range
of linguistic applications, such as computer as-
sisted translation (Isabelle et al., 1993; Brown
et al., 1990), terminology (Dagan and Church,
1994) lexicography (Langlois, 1996; Klavans
and Tzoukermann, 1995; Melamed, 1996), and
cross-language information retrieval (Nie et al.,
* This research was funded by the Canadian De-
partment of Foreign Affairs and International Trade
(http://~.dfait-maeci.gc.ca/), via the Agence de
la francophonie (http://~. franeophonie, orE)
1998). However, the case for trilingual and mul-
tilingual alignments is not as clear. True multi-
lingual resources such as multilingual glossaries
are not widely used, and most of the time, when
such resources exist, the real purpose is usually
to provide bilingual resources for multiple pairs
of languages in a compact way.
What we intend to show here is that while
multilingual correspondences may not be inter-
esting in themselves, multilingual text align-
ment techniques can be useful as a means of
extracting information on bilingual correspon-
dences. Our idea is that each additional version
of a text should be viewed as valuable informa-
tion that can be used to produce better align-
ments. In other words: whatever the intended
application, three languages are better than two
(and, more generally: the more languages, the
merrier!).
After going through some definitions and pre-
liminary material (Section 1), we present a gen-
eral method for aligning three versions of a text
(Section 2). We then describe some experiments
that were carried out to evaluate this approach
(Section 3) and various possible optimizations
(Section 4). Finally, we report on some disturb-
ing experiments (Section 5), and conclude with
directions for future work.
1 Trilingual Alignments
There are various ways in which the concept of
alignment can be formalized. Here, we choose
to view alignments as mathematical relations
between linguistic entities:
Given two texts, A and B, seen as sets of
linguistic units: A = {al,a2,...,am} and B =
{bl, b2, ...,bn}, we define a binary alignment
XAB as a relation on A tj B:
XAB={(al,bl),(a2,b2),(a2,b3),...}
2