Why is German Dependency Parsing More Reliable
than Constituent Parsing?
Sandra K¨ ubler
Indiana University
skuebler@indiana.edu
Jelena Proki´ c
Rijksuniversiteit Groningen
j.prokic@rug.nl
1 Introduction
In recent years, research in parsing has extended in several new directions. One of
these directions is concerned with parsing languages other than English. Treebanks
have become available for many European languages, but also for Arabic, Chinese,
or Japanese. However, it was shown that parsing results on these treebanks depend
on the types of treebank annotations used [, ]. Another direction in parsing re-
search is the development of dependency parsers. Dependency parsing profits from
the non-hierarchical nature of dependency relations, thus lexical information can
be included in the parsing process in a much more natural way. Especially ma-
chine learning based approaches are very successful (cf. e.g. [12, 13]). The results
achieved by these dependency parsers are very competitive although comparisons
are difficult because of the differences in annotation. For English, the Penn Tree-
bank [11] has been converted to dependencies. For this version, Nivre et al. [14]
report an accuracy rate of 86.3%, as compared to an F-score of 2.1 for Charniak’s
parser [1]. The Penn Chinese Treebank [1] is also available in a constituent and
a dependency representations. The best results reported for parsing experiments
with this treebank give an F-score of 81.8 for the constituent version [2] and .%
accuracy for the dependency version [14]. The general trend in comparisons be-
tween constituent and dependency parsers is that the dependency parser performs
slightly worse than the constituent parser. The only exception occurs for German,
where F-scores for constituent plus grammatical function parses range between
51.4 and 5.3, depending on the treebank, NEGRA [1] or T¨ uBa-D/Z [1]. The
dependency parser based on a converted version of T¨ uba-D/Z, in contrast, reached
an accuracy of 3.4% [14], i.e. 12 percent points better than the best constituent
analysis including grammatical functions.
Hajič, J. and Nivre, J. (eds.): Proceedings of the TLT 2006, pp. 7–18.
© Institute of Formal and Applied Linguistics, Prague, Czech Republic 2006