Structured Kernel-Based Learning for the Frame Labeling over Italian Texts Danilo Croce, Emanuele Bastianelli, and Giuseppe Castellucci Department of Enterprise Engineering University of Roma, Tor Vergata Via del Politecnico 1, 00133 Roma croce@info.uniroma2.it {emanuele.bastianelli,castellucci.giuseppe}@gmail.com Abstract. In this paper two systems participating to the Evalita Frame Label- ing over Italian Texts challenge are presented. The first one, i.e. the SVM-SPTK system, implements the Smoothed Partial Tree Kernel that models semantic roles by implicitly combining syntactic and lexical information of annotated examples. The second one, i.e. the SVM-HMM system, realizes a flexible approach based on the Markovian formulation of the SVM learning algorithm. In the challenge, the SVM-SPTK system obtains state-of-the-art results in almost all tasks. Perfor- mances of the SVM-HMM system are interesting too, i.e. the second best scores in the Frame Prediction and Argument Classification tasks, especially consider- ing it does not rely on a full syntactic parsing. Keywords: Semantic Role Labeling, Structured Kernel-Based Learning, SVM 1 Introduction Language learning systems usually generalize linguistic observations into statistical models of higher level semantic tasks, such as Semantic Role Labeling (SRL). Lexical or grammatical aspects of training data are the basic features for modeling the different inferences, then generalized into predictive patterns composing the final induced model. In SRL, the role of grammatical features has been outlined since the seminal work in [1], where symbolic expressions derived from parse trees denote the position and the relationship between a predicate and its arguments, and they are used as features. As discussed in [2–4], syntactic information of annotated examples can be effec- tively generalized in SRL through the adoption of tree kernel based learning ([5]), with- out the need of manual feature engineering: as tree kernels model similarity between two training examples as a function of their shared tree fragments, discriminative infor- mation are automatically selected by the learning algorithm, e.g., Support Vector Ma- chines (SVMs). However, when the availability of training data is limited, the informa- tion derived from structural patterns cannot be sufficient to discriminate examples. Ac- cording to the Frame Semantics [6], two Italian phrases like “Saette feriscono altri es- cursionisti 1 and “Soldati feriscono altri escursionisti 2 both evoke the CAUSE HARM 1 In English: Lightnings hurt other excursionists 2 In English: Soldiers hurt other excursionists