PropBank: Semantics of New Predicate Types Claire Bonial 1,2 , Julia Bonn 2 , Kathryn Conger 2 , Jena D. Hwang 1,2 , Martha Palmer 1,2 1 Department of Linguistics, University of Colorado at Boulder 2 Institute of Cognitive Science, University of Colorado at Boulder Bonial, Julia.Bonn, Kathryn.Conger, Hwangd, Martha.Palmer @colorado.edu Abstract This research focuses on expanding PropBank, a corpus annotated with predicate argument structures, with new predicate types; namely, noun, adjective and complex predicates, such as Light Verb Constructions. This effort is in part inspired by a sister project to PropBank, the Abstract Meaning Representation project, which also attempts to capture “who is doing what to whom” in a sentence, but does so in a way that abstracts away from syntactic structures. For example, alternate realizations of a destroying event in the form of either the verb destroy or the noun destruction would receive the same Abstract Meaning Representation. In order for PropBank to reach the same level of coverage and continue to serve as the bedrock for Abstract Meaning Representation, predicate types other than verbs, which have previously gone without annotation, must be annotated. This research describes the challenges therein, including the development of new annotation practices that walk the line between abstracting away from language-particular syntactic facts to explore deeper semantics, and maintaining the connection between semantics and syntactic structures that has proven to be very valuable for PropBank as a corpus of training data for Natural Language Processing applications. Keywords: Predicate semantics, Semantic role labelling, Syntax, Natural Language Processing 1. Introduction The annotated corpus PropBank (Palmer et al., 2005) rep- resents an ongoing effort to provide the information nec- essary to map between the syntactic analysis of a sentence and the conceptual structure of an event relation. Previ- ously, the annotation effort has focused on event relations expressed solely by verbs. (A separate but related effort, NomBank, focused on nouns (Meyers et al., 2004).) How- ever, a complete representation of event relations within and across sentences requires expanding that focus to addi- tional syntactic realizations of the same eventuality, includ- ing expressions in the form of nouns, adjectives and multi- word expressions. Capturing the semantics of these addi- tional predicates has presented challenges unique to each predicate type, as an attempt is made to assign semantic roles to all arguments and adjuncts of a predicate, but the syntactic environment in which these arguments and ad- juncts are realized can be very different. This research dis- cusses how these challenges were addressed to successfully expand the PropBank corpus: first by developing guidelines specific to the annotation of each predicate type, but also by developing practices that will eventually allow annotations to focus more on semantics alone (concepts and relations), while moving beyond language-particular syntactic facts. This new direction is in part inspired by a desire for greater interoperability with the Abstract Meaning Representation (AMR) project (Banarescu et al., 2013). A primary goal of AMR is to provide training data for meaning-based ma- chine translation; therefore, a deliberate effort is made to focus on representing semantics in a language-independent fashion. 2. PropBank and Other Lexical Resources There are currently five English lexical resources that pro- vide explicit semantic role labels for use in data annota- tion: FrameNet, VerbNet, LIRICS, EngVallex and Prop- Bank. These resources have been created independently and with differing goals, but they are surprisingly compati- ble. They differ primarily in the granularity of the semantic role labels. PropBank uses very generic labels such as Arg0 and Arg1, 1 as in: 1. President Bush has approved duty-free treatment for imports of certain types of watches. Relation (REL): approved Arg0: President Bush Arg1: duty-free treatment for imports of certain types of watches. EngVallex uses non-numbered labels (e.g. ACT (Actor), PAT (Patient), ADDR (Addressee), ORIG (Origin) and EFF (Effect)), which, with the exception of the first two, make them more descriptive, irrespective of the verb of which they are the argument. In addition to providing several al- ternative syntactic frames and a set of semantic predicates, VerbNet marks the PropBank Arg0 as an Agent, and the Arg1 as a Theme. FrameNet labels them Grantor and Ac- tion respectively, and puts them in the Grant Permission frame. The additional semantic richness provided by Verb- Net and FrameNet does not contradict PropBank, and can be seen as complementary. The LIRICS project, Linguistic InfRastructure for Interoperable ResourCes and Systems, has made a serious study of these different frameworks and of the theoretical linguistics background, resulting in a de- tailed set of Semantic Role definitions. 2 Within the LIR- ICS framework, Arg0 and Arg1 would be labeled Agent and Theme, respectively, like VerbNet. 1 The other numbered arguments in PropBank, Arg2-5, are quite verb-specific. 2 http://let.uvt.nl/general/people/bunt/docs/ LIRICS_semrole.htm 3013