Semantic Role Labelling using Higher Order SemiCRFs Sumit Bhagwani Computer Science and Engineering IIT Kanpur, Kanpur - 208016, India sumitb@cse.iitk.ac.in sumitbha@comp.nus.edu.sg * Abstract The work tries to compare the effective- ness of Higher Order SemiCRFs in learn- ing Semantic Role Labelling, exploiting the parse structure of a sentence. We work on OntoNotes dataset because of its high inter-annotator agreement. We compare our systems against the CoNLL- 2005 baseline and CRFs using the same features. 1 Introduction In recent years there has been an increasing interest in semantic parsing of natural language, and is a key issue in Information Extraction, Question Answering, Summarization, and, in general, in all NLP applications requiring some kind of semantic interpretation (Carreras and Marquez, 2004). The problem of Semantic Role Labelling (SRL henceforth) can be described as follows : Given a sentence and the target verb, identify the arguments of the verb in the sentence, where the set of argument labels is fixed. The problem is often addressed as two subproblems : Identify- ing the boundaries for the arguments, followed by assigning the semantic roles to the arguments with respect to the verb in the sentence. * The work was done while the author was an intern at School of Computing, National University of Singapore, under guidance of Dr. Lee Wee Sun. Given the sentence and the verb, SRL can be viewed as a sequence labelling task, labels being the argument type. But one can easily observe that the label assignment is not com- pletely independent. It depends on the verb, the sentence structure, voice of the verb etc. Also, since the arguments are generally sequence of words rather than words, it makes sense to iden- tify phrase/argument boundaries and label them instead. We wish to capture both the ideas using higher order dependencies in a SemiCRF, view- ing SRL as a linear tagging task. SRL being a multilabel classification in which the argument structure is generally dependent on the verb of the sentence, the task is inherently hard. The choice of argument types varies with researchers and datasets, which makes the prob- lem even harder to generalize. Section 2 discusses the previous work done in this area. Section 3 describes the dataset and the preprocessing done. Section 4 describes the baseline systems for the task and the evaluation measures. Section 5 and 6 introduce the features produced and discuss the results of baselines, SemiCRFs and CRFs respectively. Finally, sec- tion 7 concludes the work and section 8 offers suggestions for future work. 2 Related Work One of the earliest works on Automatic Seman- tic Role Labeling is (Gildea and Jurafsky, 2000). They attempted the problem as a combination of two subproblems namely segment boundary de-