A Maximum Entropy Approach To Disambiguating VerbNet Classes Lin Chen University of Illinois at Chicago Chicago, IL, USA lin@chenlin.net Barbara Di Eugenio University of Illinois at Chicago Chicago, IL, USA bdieugen@uic.edu Abstract This paper focuses on verb sense disam- biguation cast as inferring the VerbNet class to which a verb belongs. To train three different supervised learning mod- els –Maximum Entropy (MaxEnt), Naive Bayes and Decision Tree– we used lexical, co-occurrence and typed-dependency fea- tures. For each model, we built three clas- sifiers: one single classifier for all verbs, one single classifier for polysemous verbs only, and an ensemble of classifiers, one per each polysemous verb. Among those algorithms, Naive Bayes performs surpris- ingly badly. In general, MaxEnt models perform better, but Decision Trees models are competitive. Our best results are ob- tained with classifier ensembles. 1 Introduction Our research group has long been involved in re- search on the interpretation and generation of in- structional texts. Not only do we believe that verbs provide a crucial component of the semantics of such texts; we also have shown that verb-based semantics helps achieve more accurate discourse parsing (Subba and Di Eugenio, 2009). For our work on discourse parsing, we developed a new resource, the HomeRepair corpus, which contains 176 documents for a total of 53,250 words. It was manually annotated with rhetorical relations and quasi-automatically annotated with semantics. It was parsed with LCFLEX (Ros´ e and Lavie, 2000), which we integrated with VerbNet (Kipper et al., 2008) and with CoreLex, a noun lexicon (Buite- laar, 1998) (see (Subba et al., 2006) for details). VerbNet (VN) is currently the largest English verb semantics resource. In VN, verbs are grouped in classes and subclasses. Each VN class is com- pletely described by thematic roles, selectional re- strictions on the arguments, and frames consist- ing of a syntactic description and semantic pred- icates – see the class remove-10.1 in Figure 1. Our parser was integrated with VerbNet 2.1, which covered 3445 different verbs, for a total of 4656 verb senses, grouped in 191 first level classes. 1 The quasi-automatic quality of the semantic an- notation of our corpus is due to manual disam- biguation of the correct interpretation among sev- eral LCFLEX may return. Some alternative in- terpretations are due to syntactic ambiguities, but others, to lack of verb sense disambiguation. For example, in the sentence you may have to cut some tiles, cut is mapped to two distinct VN classes, BUILD-26.1 and the correct CUT-21.1. Our work builds on much previous work on verb sense disambiguation. Verb sense disambiguation is a subtask within word sense disambiguation, but we do not have room here to review that vast literature. As concerns verb sense disambigua- tion, a first distinction concerns what counts as a verb sense: some of the work, e.g. (Dang and Palmer, 2005; Dligach and Palmer, 2008; Banerjee and Pedersen, 2010), focuses on verb senses var- iously derived from WordNet senses, not on VN class disambiguation. Other work, e.g. (Lapata and Brew, 2004), uses Levin’s verb class defini- tions, which in turn are the foundations of VerbNet class definitions, but result in a different classifica- tion problem. If we now turn to VN class disam- biguation, distinctions in approaches concern the specific models used, the features those models are built from, and / or the corpora that are em- ployed. Previous work on VN class disambigua- tion (Girju et al., 2005; Abend et al., 2008) has fo- cused almost exclusively on standard corpora such as PropBank; more importantly, it has used no re- lational information between a verb and its argu- ments, whereas we use typed dependencies here. 1 VerbNet 3.1, the latest version, contains 3769 different verbs, for a total of 5257 verb senses, grouped in 274 classes.