Towards a Heuristic Categorization of Prepositional Phrases in English with WordNet Serguei Mokhov (mokhov@cs.concordia.ca) Frank Rudzicz (f_rudzic@cs.concordia.ca) $Revision: 1.14 $ 7th December 2003 Abstract This document discusses an approach and its rudimentary realization towards automatic classification of PPs; the topic, that has not received as much attention in NLP as NPs and VPs. The approach is a rule-based heuristics outlined in several levels of our research. There are 7 semantic categories of PPs considered in this document that we are able to classify from an annotated corpus. 1 Introduction Historically, prepositions have not enjoyed the attention of nouns and verbs, being until recently relegated to the status of “an annoying little surface peculiarity” [Jac73]. However, linguistics tells us that different syntactic cate- gories contain distinct semantic characteristics which are often exclusive to members of that category [LP91]. This raises two distinct yet equally important questions: How are such categories found syntactically, and how are their characteristics expressed semantically? The famed linguist Sir Randolph Quirk states that “a preposition expresses a relation between two entities, one being that represented by the prepositional complement” [Qui85]. In this paper we describe the development of a heuristics-based system. 2 Theoretical Foundations Through a survey of the literature, it is clear that the study of prepositions is becoming more intricate, and is seg- mented into a few focussed areas. Our work is therefore motivated towards the construction of a sequential system of increasingly complex levels, each of which is represen- tative of one of these focussed areas. The system is orga- nized in such a way that the product of one level becomes a dependency of the next as outlined by the following se- quence: Level 0: From Part-of-Speech annotated text, minimal prepositional phrases are found at the syntac- tic level according to a context-free grammar (CFG), and not categorized. Level 1: Minimal prepositional phrases are aug- mented with a set of labels indicating classes of semantic roles, by means of rule-based heuristics. Level 2: The proper attachment of the prepositional phrase is attempted with shallow heuristics based on results of Levels 0 and 1, in case of ambiguity. Level 3: Semantic characteristics of the PP and its co-predicate phrases are analyzed in order to perform attachment ’intelligently’, and do discover more thorough semantic relations. Each level is described in more detail in Sections 2.1 through 2.4. 1 arXiv:1002.1095v1 [cs.CL] 4 Feb 2010