An Approach to Acquire Word Translations from Non-Parallel Texts Pablo Gamallo Otero 1 and Jos´ e Ramom Pichel Campos 2 1 Departamento de L´ ıngua Espanhola, Faculdade de Filologia Universidade de Santiago de Compostela, Galiza, Spain pablogam@usc.es 2 Departamento de Tecnologia Lingu´ ıstica da Imaxin|Software Santiago de Compostela, Galiza jramompichel@imaxin.com c Springer-Verlag Abstract. Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now. The current approach, which relies on the previous extraction of bilingual pairs of lexico-syntactic templates from parallel corpora, makes a significant im- provement to about 89% of words translations identified correctly. 1 Introduction In the last decade, many works have been carried out to automatically extract word and/or multi-word translations from bilingual parallel corpora [15, 1, 20, 13]. These works share a common strategy: they perform first the alignment of segments and then, on the basis of such an alignment, they compute word correspondences in each pair of segments. In some of these experiences, word- level translation accuracy achieved very high scores: about 99%. Unfortunately, the amount of available bilingual parallel texts is still small, specially in specific, academic or technological domains. Given a particular knowledge domain, the number of parallel texts is much lower than that of monolingual texts. This type of texts are known as comparable, non-parallel corpora. Nowadays, in the World Wide Web, comparable non-parallel corpora are more prevalent than parallel corpora. However, the highest rate to date in word-level translation from non- parallel corpora is relatively small, 72% [18], in comparison to the accuracy rate achieved from parallel corpora. This work has been supported by Ministerio de Educacin y Ciencia of Spain, within the project GARI-COTERM, ref: HUM2004-05658-D02-02.