Parsing Combinatory Categorial Grammar via Planning in Answer Set Programming Yuliya Lierler 1 and Peter Sch ¨ uller 2 1 Department of Computer Science, University of Kentucky yulia@cs.uky.edu 2 Institut f ¨ ur Informationssysteme, Technische Universit¨ at Wien ps@kr.tuwien.ac.at Abstract. Combinatory categorial grammar (CCG) is a grammar formalism used for natural language parsing. CCG assigns structured lexical categories to words and uses combinatory rules to combine these categories to parse a sentence. In this work we propose and implement a new approach to CCG parsing that relies on a prominent knowledge representation formalism, answer set programming (ASP) — a declarative programming paradigm. We formulate the task of CCG parsing as a planning problem and use an ASP computational tool to compute solutions that correspond to valid parses. Compared to other approaches, there is no need to implement a specific parsing algorithm using such a declarative method. Our approach aims at producing all semantically distinct parse trees for a given sentence. From this goal, normalization and efficiency issues arise, and we deal with them by combining and extending existing strategies. We have implemented a CCG parsing tool kit — ASPCCGTK— that uses ASP as its main computational means. The C&C supertagger can be used as a preprocessor within ASPCCGTK, which allows us to achieve wide-coverage natural language parsing. 1 Introduction The task of parsing, i.e., recovering the internal structure of sentences, is an important task in natural lan- guage processing. Combinatory categorial grammar (CCG) is a popular grammar formalism used for this task. It assigns basic and complex lexical categories to words in a sentence and uses a set of combinatory rules to combine these categories to parse the sentence. In this work we propose and implement a new approach to CCG parsing that relies on a prominent knowledge representation formalism, answer set pro- gramming (ASP) — a declarative programming paradigm. Our aim is to create a wide-coverage 3 parser which returns all semantically distinct parse trees for a given sentence. One major challenge of natural language processing is ambiguity of natural language. For instance, many sentences have more than one plausible internal structure, which often provide different semantics to the same sentence. Consider a sentence John saw the astronomer with the telescope. It can denote that John used a telescope to see the astronomer, or that John saw an astronomer who had a telescope. It is not obvious which meaning is the correct one without additional context. Natural language ambiguity inspires our goal to return all semantically distinct parse trees for a given sentence. CCG-based systems OPENCCG [29] and TCCG [1, 3] (implemented in the LKB toolkit) can provide multiple parse trees for a given sentence. Both use chart parsing algorithms with CCG extensions such as modalities or hierarchies of categories. While OPENCCG is primarily geared towards generating sen- tences from logical forms, TCCG targets parsing. However, both implementations require lexicons 4 with specialized categories. Generally, crafting a CCG lexicon is a time–consuming task. An alternative method to using a hand-crafted lexicon has been implemented in a wide-coverage CCG parser — C&C [6, 7]. C&C relies on machine learning techniques for tagging an input sentence with CCG categories as well as for creating parse trees with a chart algorithm. As training data, C&C uses CCGbank— a corpus of 3 The goal of wide-coverage parsing is to parse sentences that are not within a controlled fragment of natural language, e.g., sentences from newspaper articles. 4 A CCG lexicon is a mapping from each word that can occur in the input to one or more CCG categories.