Constructions in Latvian Treebank: the Impact of Annotation Decisions on the Dependency Parsing Performance Lauma PRETKALNIŅA a,1 and Laura RITUMA a a Institute of Mathematics and Computer Science, University of Latvia Abstract. In this paper, we analyze the impact of various dependency representations for various constructions on the general parsing accuracy and on the parsing accuracy of these constructions. We focus on the analysis of coordination constructions, complex predicates, and punctuation mark attachment. We use Latvian Treebank as a dataset, thus, providing insight for an inflective language with a rather free word order. Experiments with MaltParser, a transition- based parser, show clear difference in learnability of various representations for the considered constructions. Future work would include carrying out comparable experiments with a graph-based dependency parser like MSTParser. Keywords. Dependency parsing, coordination, punctuation, multiword expressions, complex predication, annotation decisions, Latvian Treebank Introduction Dependency parsers are among basic language processing tools. Considering a formal dependency model from the linguistic point of view, it offers a diversity of ways how to represent various constructions: while linguists tend to agree how dependency analysis should be performed on core phenomena, there are several important linguistic phenomena with no general consensus available. Various studies e.g. [1], [2] show that different dependency representations influence both parser accuracy and accuracy of the tools relaying on the parser, thus, the annotation decisions have far-reaching consequences. We explore effects of varying dependency representations for three language phenomena: coordination constructions, punctuation mark attachment and multiword predicates. When considering coordination constructions we include coordinated clauses. When considering multiword predicates we include compound tense forms, compound predicate, and predicates with modifiers. We do intrinsic parser evaluation with the focus on these constructions—we train dependency parsers on data where these constructions have been annotated in different ways and then compare accuracy scores for tokens involved in these constructions. As a dataset we use Latvian Treebank. The native annotation model for Latvian Treebank is a dependency based hybrid model [3], [4]. The constructions considered in 1 Corresponding Author: lauma@ailab.lv, Institute of Mathematics and Computer Science, University of Latvia; Raiņa bulv. 29, Rīga, LV-1459, Latvia Human Language Technologies – The Baltic Perspective A. Utka et al. (Eds.) © 2014 The authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License. doi:10.3233/978-1-61499-442-8-219 219