Unsupervised WSD with a Dynamic Thesaurus * Javier Tejada-Cárcamo, 1,2 Hiram Calvo 1 , Alexander Gelbukh 1 1 Center for Computing Research, National Polytechnic Institute, Mexico City, 07738, Mexico 2 Sociedad Peruana de Computación, Arequipa, Peru jawitejada|@|hotmail.com, hcalvo|@|cic.ipn.mx, gelbukh|@|gelbukh.com Abstract. Diana McCarthy et al. (ACL-2004) obtain the predominant sense for an ambiguous word based on a weighted thesaurus of words related to the am- biguous word. This thesaurus is obtained using Dekang Lin’s (COLING-ACL- 1998) distributional similarity method. Lin averages the distributional similarity by the whole training corpus; thus the list of words related to a given word in his thesaurus is given for a word as type and not as token, i.e., does not depend on a context in which the word occurred. We observed that constructing a list similar to Lin’s thesaurus but for a specific context converts the method by McCarthy et al. into a word sense disambiguation method. With this new me- thod, we obtained a precision of 69.86%, which is even 7% higher than the su- pervised baseline. 1 Introduction Word Sense Disambiguation (WSD) task consists in determining the intended sense of an ambiguous word in a specific context. For example, doctor has three senses listed in WordNet: (1) person who practices medicine, (2) person who holds Ph.D. degree from an academic institution; and (3) a title conferred on 33 saints who distin- guished themselves through the orthodoxy of their theological teaching. The WSD task consists in determining which sense is intended, e.g., in the context The doctor prescribed me a new medicine. This task is important, for example, in information retrieval, where the user expects the documents be selected based on a particular sense of the query word; in machine translation and multilingual querying systems, where an appropriate translation of the word must be chosen in order to produce the translation or retrieve the correct set of documents, etc. The WSD task is usually addressed in two ways: (1) supervised learning: applying machine-learning techniques trained on previously hand-tagged documents and (2) unsupervised learning: automatically learning, directly from raw word grouping, clues that lead to a specific sense, according to the hypothesis that different words have similar meanings if they occur in similar contexts [4, 6]. The Senseval competitions are devoted to the advances of the state-of-the-art me- thods for WSD. For instance, the results of Senseval-2 English all-words task are presented in Table 1. This task consists of 5,000 words of running text from three * Work done under partial support of Mexican Government (CONACyT, SNI) and IPN (PIFI, SIP, COTEPABE). The authors thank Rada Mihalcea for useful comments and discussion.