Intelligent Information Systems ISBN 978-83-7051-580-5, pages 63–76 Evaluating LexCSD—a Weakly-Supervised Method on Improved Semantically Annotated Corpus in a Large Scale Experiment * Bartosz Broda, Maciej Piasecki, and Marek Maziarz Institute of Informatics, Wroclaw University of Technology, Poland Abstract Word Sense Disambiguation in text is still a diﬃcult problem as the best supervised methods require laborious and costly manual preparation of training data. On the other hand, the unsupervised methods express signiﬁcantly lower accuracy and produce results that are not satisfying for many application. Recently, an algorithm based on weakly-supervised learning for WSD called Lexicographer- Controlled Semi-automatic Sense Disambiguation (LexCSD) was proposed. The method is based on clustering of text snippets including words in focus. For each cluster we ﬁnd a core which is labelled with a word sense by a human and is used to produce a classiﬁer. Classiﬁers, constructed for each word separately, are applied to text. The goal of this work is to evaluate LexCSD trained on large amount of untagged text. A performed comparison showed that the approach is better than most frequent sense baseline in most cases and in some cases beat the supervised equivalents. For the need of experiment semantically annotated corpus was improved in terms of coverage of sense inventory and annotations. 1 Introduction The aim of Word Sense Disambiguation (WSD) is to choose the right sense (lexi- cal meaning) for a word in a context. Many words have more then one sense, but usually only one of them is active in a given context. F.e., an electronic thesaurus called WordNet (Fellbaum et al., 1998) has 36 entries for line. WSD is diﬃcult, but important problem for many applications in Natural Language Processing (NLP). The ﬁeld of machine translation is an obvious example as the use of robust WSD system helps in choosing the correct translation in contexts. Also information re- trieval, information extraction, text mining or computer-aided lexicography could beneﬁt from high quality WSD system (Agirre and Edmonds, 2006). WSD is not an easy problem to solve, partially because the deﬁnition of lexical meaning is not clear and the boundaries between diﬀerent senses are not crisp and obvious (Kilgarriﬀ, 2006). To overcome theoretical aspect of this problem dictionaries are used as a mean to enumerate all of the diﬀerent word senses. In WSD a set of senses is called a sense inventory. * This work was supported by fellowship co-ﬁnanced by European Union within European So- cial Fund and co-ﬁnanced by Innovative Economy Programme project POIG.01.01.02-14-013/09.