Multilingual Corpora Annotation for Processing Definite Descriptions Renata Vieira 1 , Susanne Salmon-Alt 2 , Emmanuel Schang 2 1 Unisinos, Centro de Ciências Exatas e Centro de Ciências da Communicação, Av. Unisinos, 950, 93022-000 São Leopoldo – RS, Brasil renata@exatas.unisinos.br 2 ATILF-Laboratoire d’Analyse et Traitement Informatique de la Langue Française, 44 Av. de la Libération, 54063 Nancy, France Susanne.Salmon-Alt@inalf.fr , Emmanuel.Schang@wanadoo.fr Abstract. This paper presents a multilingual corpora study aimed to verify the applicability of heuristics developed for coreference resolution in English texts to Portuguese and French language. 1. Introduction The multilingual corpora study presented in this paper brings together two well- known research topics in natural language processing. One topic largely approached in the field of computational semantics is the study of definite descriptions. The other topic, related to the field of information extraction, is the problem of coreference and anaphora resolution in natural language texts. Whereas much work focus on anaphora resolution, mostly for English [3], [5], [7], [9], our work focus on different languages (French, European Portuguese, Brazilian Portuguese) and a different type of referring expression (definite descriptions, defined as noun phrases starting with the definite article, such as the house, the old house, the house that I bought). The main motivation for our multilingual corpus study of definite descriptions is the development of a multilingual tool for anaphora and coreference resolution. It has often been advocated that for the interpretation of definite descriptions 1 one has to find a textual antecedent, which is coreferent 2 with the description - a house… the house; or else, to link the description with a non-coreferent discourse entity that is an anchor for the description interpretation - a house… the door. In that case, the coreference resolution of definite descriptions would always involve the identification of a textual antecedent. However, other studies of Swedish and English [2], [12], have shown that very often definite descriptions in written discourse do not have a textual antecedent, because the anchor used for interpretation comes from the context, from 1 Considering referential uses of definite descriptions, where interpretation relates to the identification of a previously introduced discourse entity. 2 Our definition of coreference follows van Deemter & Kibble’s [16]: two noun phrases a and b corefer iff Referent(a)=Referent(b).