On Integration of Terminological Data in Translation Systems Signe Rirdance, Andrejs Vasiljevs Tilde, Latvia signe.rirdance@tilde.lv, andrejs@tilde.lv Abstract The current translation practice demonstrates lack of integration support between the traditional desktop translation tools and the rich terminological data available on the internet. This article sets the background for development of a new layer of web-based translation tools for automated translation of multilingual terminology, bridging the gap between translation tools and environments and internet term banks. It analyses the experience gained during the EuroTermBank project that proposes solutions to a number of challenges in integration of term banks with translation tools, such as the federation approach to interlinking term banks and the entry compounding approach for visual representation of multiple overlapping terminology entries. The article propones a standards-based approach to ensure data compatibility, and identifies the requirement to support terminology sharing on an interoperable level. Index Terms: term bases, translation tools, terminology sharing 1. Introduction In today’s translation practice, a significant gap exists between the traditional desktop translation tools and the terminological data available on the internet. Translators spend from 30 up to 60% of total translation time on terminology research, therefore it is vital to ensure that they can use all the required terminology resources in the right format and in the right environment. Currently, translators spend a lot of time inefficiently, searching and processing information from multiple sources and changing its format to the one they require in their work environment. Spending time on technical aspects instead of focusing on true terminology research results in cost inefficiencies and reduced translation quality. Moreover, translation practice often involves redundant work in identifying, creating or compiling the same terminology over and over again, by various translators. To reach a new level of translation productivity, a layer of new tools and technologies is required that 1) enables consolidation and integration of dispersed terminology resources; 2) provides online access to consolidated multilingual resources through internet term banks; 3) provides tools that connect specific translation environments with terminology resources on the internet; 4) introduces standards that enable terminology interoperability, sharing and reuse. 2. Discussion This article sets the background for defining a toolset required for integration of terminological data into translation environments. It shortly reviews the state-of-the-art in commercial translation systems and analyzes the experience and best practices from the EuroTermBank project in consolidating diverse terminology resources. It wraps up by introducing a layer of tools being developed to support integration of multiple term banks with a diverse set of typical translation environments. 2.1. Terminology and translation systems Computer tools and technologies that serve the purpose of assisting human translation are commonly known as CAT or Computer Assisted Translation software, also sometimes referred to as MAHT (Machine Assisted Human Translation) [1]. The basic environment assisting human translation is text processing applications that typically provide very basic CAT features like spell-checking and grammar checking; no terminology support is provided. Microsoft Word provides the additional function of searching in predefined reference resources and sites, provided in English and some other major languages. Since end of 1980s, translation memory (TM) tools have been developed that utilize alignments or linkages between source and target texts. Often, translation memory tools include a terminology management module that enables the translator to search automatically in a given terminology database for terms appearing in a document, however, this function is limited to searching in a proprietary terminology base format. Examples of TM tools and their terminology management modules are: SDL Trados and SDL MultiTerm, Wordfast, DeJa Vu, Star Transit and others. There are, however, some major drawbacks of these tools regarding handling of terminology for translation. The most widely used translation environment tool's terminology module, MultiTerm is a full-fledged terminology management application, and as such, its complexity by far exceeds the complexity required by majority of translators. Translators using SDL Trados usually do not exploit the potential of MultiTerm and refrain from creating and using terminology. Unlike in translation memory handling, providing efficient terminology recognition requires language-specific support to match inflected term forms with regular forms in the dictionary. Translation tools on the market provide morphology support for only few major languages, which is insufficient in the global multilingual environment. Most translation tools used by freelance translators provide no support for internet resources or internet-based communication with the language workers’ communities on the internet. While server-based work using embedded terminology workflows is supported in systems used by multinational corporations and organizations, their cost is prohibitive for the average industry practitioner. 2.2. Terminology consolidation in EuroTermBank The goal of EuroTermBank project [1] is to facilitate terminology data accessibility and exchange, by collecting,