Multilingual linguistic resources: from monolingual lexicons to bilingual interrelated lexicons Marta Villegas *, Nuria Bel , Alessandro Lenci # , Nicoletta Calzolari # , Nilda Rumy # , Antonio Zampolli # , Teresa Sadurní*, Joan Soler* GILCUB (Grup Investigació Lingüística Computacional Universitat Barcelona) {tona,nuria}@gilcub.es # Istituto di Linguistica Computationale. CNR {lenci,glotollo,nilda,parole}@ilc.pi.cnr.it Institut d’Estudis Catalans {mvillegas,tsadurni,jsoler}@iec.es Abstract This paper describes a procedure to convert the PAROLE-SIMPLE monolingual lexicons into bilingual interrelated lexicons where each word sense of a given language is linked to the pertinent sense of the right words in one or more target lexicons. Nowadays, SIMPLE lexicons are monolingual although the ultimate goal of these harmonised monolingual lexicons is to build multilingual lexical resources. For achieving this goal it is necessary to automatise the linking among the different senses of the different monolingual lexicons, as the production of such multilingual relations by hand will be, as all tasks related with the development of linguistic resources, unaffordable in terms of human resources and time spent. The system we describe in this paper takes advantage of the SIMPLE model and the SIMPLE based lexicons so that, in the best case, it can find fully automatically the relevant sense-to-sense correspondences for determining the translational equivalence of two words in two different languages and, in the worst case, it will be able to narrow the set of admissible links between words and relevant senses. This paper also explores to what extent semantic encoding in already existing computational lexicons such as SIMPLE can help in overcoming the problems arisen when using monolingual meaning descriptions for bilingual links and aims to set the basis for defining a model for adding a bilingual layer to the SIMPLE model. This bilingual layer based on a bilingual relation model will be the basis indeed for defining the multilingual language resource we want PAROLE- SIMPLE lexicons to become. 1. Introduction Re-utilization of existing lexical resources and automatic production of more information to enrich them so that these become the basis for a broad range of HLT applications is the main objective of the work presented in this paper. Thus, the objective was to study the feasibility of reusing SIMPLE monolingual semantic lexicons to build a multilingual lexical resource. SIMPLE is a follow up of the PAROLE project (see www.ub.es/gilcub/SIMPLE/simple.html) that has added a semantic layer to the already existing morphological and syntactic layers developed by PAROLE, being these layers an harmonized common model for computational lexicons encoding relevant information. The semantic lexicons produced (about 10,000 semantic units for each of the 12 PAROLE languages) follow an harmonized common model that encodes structured semantic types and frames, linked to syntactic and morphological information. The ultimate aim of the work we are reporting is to define a new layer of information that supplies a model for encoding word to word links paired via sense-to-sense correspondences between two, or more, monolingual computational lexicons. This model has to provide the means to create bilingual, in a first step, and multilingual, at the end, links among the words contained in the different lexicons. This paper is however mainly concerned with the procedures that will allow automatic creation of links among words based on their translational equivalence. The starting point has been to profit of traditional bilingual dictionaries as they are the obvious and most extensive repository of bilingual knowledge. Being, though, for human consultation, the only information we should rely on are the word to word correspondences, as traditional bilingual dictionaries bear little systematic information about constraints on the input and target senses for these words to be related 1 . Thus an entry for the Spanish word manzana in a Spanish-Catalan bilingual dictionary may look like: (1) manzana: 1. (Fruit) poma ('apple'), 2. (of houses) illa ('block') Once having extracted the words which can be considered translational equivalents in at least one case, the key point is then to determine under what sense is this correspondence based, so as to consider the combination of 'word+sense' as an element of a fully translational equivalent pair for both languages. The most obvious argument supporting the need for this sense identification is to ensure bi-directionality between bilingual dictionary entries. For example, while in (1) above we know that the correspondence manzana- poma is true bi-directionally, in the correspondence manzana-illa bi-directionality does not hold, as the Catalan entry illa can also refer to an island. This case of partial equivalents is the most frequent case in bilingual dictionaries, due to the polisemy of most words. 1 Sometimes there is no information at all, or it is non-systema- tically expressed in terms of (i) a semantic descriptor or hyperonym; (ii) an example; (iii) a reference to a domain; etc.