Constructions in Latvian Treebank:
the Impact of Annotation Decisions
on the Dependency Parsing Performance
Lauma PRETKALNIŅA
a,1
and Laura RITUMA
a
a
Institute of Mathematics and Computer Science, University of Latvia
Abstract. In this paper, we analyze the impact of various dependency
representations for various constructions on the general parsing accuracy and on
the parsing accuracy of these constructions. We focus on the analysis of
coordination constructions, complex predicates, and punctuation mark attachment.
We use Latvian Treebank as a dataset, thus, providing insight for an inflective
language with a rather free word order. Experiments with MaltParser, a transition-
based parser, show clear difference in learnability of various representations for
the considered constructions. Future work would include carrying out comparable
experiments with a graph-based dependency parser like MSTParser.
Keywords. Dependency parsing, coordination, punctuation, multiword
expressions, complex predication, annotation decisions, Latvian Treebank
Introduction
Dependency parsers are among basic language processing tools. Considering a formal
dependency model from the linguistic point of view, it offers a diversity of ways how
to represent various constructions: while linguists tend to agree how dependency
analysis should be performed on core phenomena, there are several important linguistic
phenomena with no general consensus available. Various studies e.g. [1], [2] show that
different dependency representations influence both parser accuracy and accuracy of
the tools relaying on the parser, thus, the annotation decisions have far-reaching
consequences. We explore effects of varying dependency representations for three
language phenomena: coordination constructions, punctuation mark attachment and
multiword predicates. When considering coordination constructions we include
coordinated clauses. When considering multiword predicates we include compound
tense forms, compound predicate, and predicates with modifiers. We do intrinsic parser
evaluation with the focus on these constructions—we train dependency parsers on data
where these constructions have been annotated in different ways and then compare
accuracy scores for tokens involved in these constructions.
As a dataset we use Latvian Treebank. The native annotation model for Latvian
Treebank is a dependency based hybrid model [3], [4]. The constructions considered in
1
Corresponding Author: lauma@ailab.lv, Institute of Mathematics and Computer Science, University
of Latvia; Raiņa bulv. 29, Rīga, LV-1459, Latvia
Human Language Technologies – The Baltic Perspective
A. Utka et al. (Eds.)
© 2014 The authors and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License.
doi:10.3233/978-1-61499-442-8-219
219