A Supervised Method of Feature Weighting for Measuring Semantic Relatedness Alistair Kennedy 1 and Stan Szpakowicz 1,2 1 SITE, University of Ottawa, Ottawa, Ontario, Canada {akennedy,szpak}@site.uottawa.ca 2 Institute of Computer Science Polish Academy of Sciences, Warsaw, Poland Abstract. The clustering of related words is crucial for a variety of Nat- ural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have simi- lar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the most popular measures is Pointwise Mutual Information. It increases the weight of contexts where a word appears regularly but other words do not, and decreases the weight of contexts where many words may appear. Essen- tially, it is unsupervised feature weighting. We present a method of su- pervised feature weighting. It identiﬁes contexts shared by pairs of words known to be semantically related or unrelated, and then uses Pointwise Mutual Information to weight these contexts on how well they indicate closely related words. We use Roget’s Thesaurus as a source of training and evaluation data. This work is as a step towards adding new terms to Roget’s Thesaurus automatically, and doing so with high conﬁdence. 1 Introduction Pointwise Mutual Information (PMI) is a measure of association between two values of two random variables. PMI has been applied to a variety of Natural Language Processing (NLP) tasks, and shown to work well when identifying contexts indicative of a given word. In eﬀect, PMI can be used to give higher weights to contexts in which a word occurs frequently, but other words appear rarely, while giving lower weight to contexts with distributions closer to random. Finding these weights requires no actual training data, so it is essentially an un- supervised method of context weighting, an observation also made in [1]. In our paper we show how to incorporate supervision into the process of context weight- ing. We learn appropriate weights for the contexts from known sets of related and unrelated words extracted from a thesaurus. PMI is then calculated for each context: we measure the association between pairs of words which appear in that context and pairs of words which are known to be semantically related. The PMI scores can then be used to apply a weight to the contexts in which a word is found. This is done by building a word-context matrix which records the counts