Building and Exploring Semantic Equivalences Resources Gracinda Carvalho 1,2,3 , David Martins de Matos 2,4 , Vitor Rocio 1,3 1 Universidade Aberta, 2 L2F/INESC-ID Lisboa, 3 CITI - FCT/UNL, 4 Instituto Superior T´ ecnico/UTL 1 Rua da Escola Polit´ ecnica, 147, 1269-001 Lisboa - Portugal, 2 Rua Alves Redol 9, 1000-029 Lisboa - Portugal gracindac@uab.pt,david.matos@inesc-id.pt,vjr@uab.pt Abstract Language resources that include semantic equivalences at word level are common, and its usefulness is well established in text processing applications, as in the case of search. Named entities also play an important role for text based applications, but are not usually covered by the previously mentioned resources. The present work describes the WES base, Wikipedia Entity Synonym base, a freely available resource based on the Wikipedia. The WES base was built for the Portuguese Language, with the same format of another freely available thesaurus for the same language, the TeP base, which allows integration of equivalences both at word level and entity level. The resource has been built in a language independent way, so that it can be extended to different languages. The WES base was used in a Question Answering system, enhancing significantly its performance. Keywords: Information Extraction, Information Retrieval; Question Answering; Semantic Information 1. Introduction Semantic Equivalences resources group together different words or lexical units that have the same or equivalent meaning. Semantic Equivalences have been thoroughly used in searching, where several words with the same meaning may be used alternatively. The search for any of these words allows the retrieval of related information, that would otherwise be missed. This type of electronic resources, for example organized as thesaurus, are usually constructed from previously compiled publications on pa- per that were the subject of long years of human effort, to which the digital organization adds a lot more of human hours. Another aspect of semantic equivalences, that is not cov- ered in the above mentioned resources, are those occur- ring between named entities. Named entities are a com- mon presence in texts, and its correct identification is a crucial task for many text based applications, as informa- tion extraction, information retrieval, question answering, summarization, discourse analysis and opinion mining, just to name a few. The same entity can be identified through different names, hence the utility of similar equivalence re- sources, this turn for names, not words. In this paper we present the WES base (Wikipedia En- tity Synonyms base), a freely available resource built from the Portuguese version of the Wikipedia, in which alterna- tive names for the same entity are grouped together. Al- though this is an automatically built resource, like the case of a thesaurus, it reflects the information resulting from a large number of human hours. The fact that the in- formation source, Wikipedia, is constantly being updated and edited ensures that the information is extended and en- hanced through time. The WES base has a compatible format with another freely available resource for the Portuguese language, the TeP base (Electronic Thesaurus of Portuguese). Together, these two resources can be used to cover synonyms both at word level and entity name level for Portuguese. The paper is organised as follows: in Section 2 we briefly describe the TeP base and its structure, and in Section 3 we describe the WES base together with the motivation for its construction its characterisation and results achieved. In Section 4 we describe different approaches used in the con- struction of named entity resources. We end with Section 5, dedicated to Conclusions. 2. TeP The TeP base (Dias-da-Silva et al., 2000; Dias-da-Silva et al., 2008; Maziero et al., 2008) is a manually built The- saurus for the Portuguese Language and is a freely available resource 1 . Its structure is based on the WordNet synsets 2 . We used version TeP 2.0 that has around 19 888 synsets, within the morphological classes of Verbo [verb], Substan- tivo [noun], Adj´ etivo [adjective] and Adv´ erbio [adverb]. An Example of a synset is presented in Figure. 1. In this case it is about verbs with the same meaning as revolu- cionar [revolutionize]. 2326. [Verbo] {revolucionar, transfazer, transformar} Figure 1: Synset of TeP base. The TeP base includes also information about another se- mantic relationship, that of antonimy between synsets, or indirect antonymy, indicated at the end of the synset be- tween 〈〉. An example, in this case for nouns expressing satisfac ¸˜ ao [satisfaction], is given in Figure. 2, with the synset referred as containing nouns with the opposite mean- ing presented in Figure. 3. 1 http://www.nilc.icmc.usp.br/tep2/download.htm 2 A synset is an entry representing a semantic equivalence be- tween lexical units or words. 2038