‘Irrefragable answers’ using comparable corpora
to retrieve translation equivalents
Serge Sharoff · Bogdan Babych · Anthony Hartley
Published online: 12 December 2007
© Springer Science+Business Media B.V. 2007
Abstract In this paper we present a tool that uses comparable corpora to find
appropriate translation equivalents for expressions that are considered by translators
as difficult. For a phrase in the source language the tool identifies a range of possible
expressions used in similar contexts in target language corpora and presents them to
the translator as a list of suggestions. In the paper we discuss the method and present
results of human evaluation of the performance of the tool, which highlight its
usefulness when dictionary solutions are lacking.
Keywords Large comparable corpora · Translation equivalents · Multiword
expressions · Distributional similarity
1 Introduction
There is no doubt that both professional and trainee translators need access to
authentic data provided by corpora. With respect to polysemous lexical items,
bilingual dictionaries list several translation equivalents for a headword, but words
taken in their contexts can be translated in many more ways than indicated in
dictionaries. For instance, the Oxford Russian Dictionary (ORD) lacks a translation
for the Russian expression исчерпьІВаюЩий ОТВеТ (‘exhaustive answer’), while the
Multitran Russian–English dictionary suggests that it can be translated as
irrefragable answer. Yet this expression is extremely rare in English; on the
Internet it occurs mostly in pages produced by Russian speakers.
On the other hand, translations for polysemous words are too numerous to be listed
for all possible contexts. For example, the entry for strong in ORD already has 57
subentries and yet it fails to mention many word combinations frequent in the British
S. Sharoff (&) · B. Babych · A. Hartley
Centre for Translation Studies, University of Leeds, Leeds LS2 9JT, UK
e-mail: S.Sharoff@leeds.ac.uk
123
Lang Resources & Evaluation (2009) 43:15–25
DOI 10.1007/s10579-007-9046-4