Word Sense Disambiguation using Wikipedia Bharath Dandala, Rada Mihalcea, and Razvan Bunescu Abstract This paper describes explorations in word sense disambiguation using Wikipedia as a source of sense annotations. Through experiments on four different languages, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers. 1 Introduction Ambiguity is inherent to human language. In particular, word sense ambiguity is prevalent in all natural languages, with a large number of the words in any given language carrying more than one meaning. For instance, the English noun plant can mean green plant or factory; similarly the French word feuille can mean leaf or paper. The correct sense of an ambiguous word can be selected based on the context where it occurs, and correspondingly the problem of word sense disambiguation is defined as the task of automatically assigning the most appropriate meaning to a polysemous word within a given context. Two well studied categories of approaches to WSD are represented by knowledge- based [27, 17, 40] and data-driven [54, 41, 44] methods. Knowledge-based methods rely on information drawn from wide-coverage lexical resources such as WordNet [35]. Their performance has been generally constrained by the limited amount of lexical and semantic information present in these resources. In a recent effort to alle- viate the semantic information bottleneck, Ponzetto and Navigli [46] created Word- Bharath Dandala Dept. of Computer Science, University of North Texas, Denton, TX, e-mail: BharathDan- dala@my.unt.edu Rada Mihalcea Dept. of Computer Science, University of North Texas, Denton, TX, e-mail: rada@cs.unt.edu Razvan Bunescu School of EECS, Ohio University, Athens, OH, e-mail: bunescu@ohio.edu 1