The Spanish Resource Grammar Montserrat Marimon Universitat de Barcelona Gran Via de les Corts Catalanes 585 08007-Barcelona montserrat.marimon@ub.edu Abstract This paper describes the Spanish Resource Grammar, an open-source multi-purpose broad-coverage precise grammar for Spanish. The grammar is implemented on the Linguistic Knowledge Builder (LKB) system, it is grounded in the theoretical framework of Head-driven Phrase Structure Grammar (HPSG), and it uses Minimal Recursion Semantics (MRS) for the semantic representation. We have developed a hybrid architecture which integrates shallow processing functionalities – morphological analysis, and Named Entity recognition and classification – into the parsing process. The SRG has a full coverage lexicon of closed word classes and it contains 50,852 lexical entries for open word classes. The grammar also has 64 lexical rules to perform valence changing operations on lexical items, and 191 phrase structure rules that combine words and phrases into larger constituents and compositionally build up their semantic representation. The annotation of each parsed sentence in an LKB grammar simultaneously represents a traditional phrase structure tree, and a MRS semantic representation. We provide evaluation results on sentences from newspaper texts and discuss future work. 1. Introduction This paper describes the Spanish Resource Grammar (SRG). This grammar is designed as multi-purpose (ab- stracted away from any particular application), and broad- coverage (aiming to cover not only all variations of the phe- nomena that have been implemented, but also the combina- tions of different phenomena). The grammar is implemented on the Linguistic Knowledge Builder (LKB) system (Copestake, 2002), an interactive grammar development environment for typed feature struc- ture grammars, which includes a parser and generation, vi- sualization tools for all relevant data structures, and a set of specialized debugging facilities, and it is grounded in the theoretical framework of Head-driven Phrase Struc- ture Grammar (HPSG) (Pollard and Sag, 1987; Pollard and Sag, 1994), a constraint-based, lexicalist approach to gram- matical theory where all linguistic objects (i.e. words and phrases) are represented as typed feature structures. The SRG uses Minimal Recursion Semantics (MRS) (Copestake et al., 2006) for the semantic representation. MRS is not a semantic theory in itself, but a kind of meta-level which has been defined for describing semantic structures. Us- ing unification of typed features structures, MRS assigns a syntactically flat semantic representation to linguistic ex- pressions. The basis of the development of the SRG is the LinGO Grammar Matrix, an open-source starter-kit for rapid de- velopment of broad-coverage HPSG grammars compatible with the LKB system which supplies (1) the necessary con- figuration files for an LKB grammar development environ- ment, and (2) the basic grammar types and rules (Bender et al., 2002; Bender and Flickinger, 2005). 1 The SRG is part of the DELPH-IN open-source repository of linguistic resources and tools for writing (the LKB sys- tem), testing and benchmarking (the [incr tsbd()] compe- 1 The Grammar Matrix is accessible through a web-based customization system: http://www.delph- in.net/matrix/customize/matrix.cgi. tence and performance profiler (Oepen and Carroll, 2000)) and efficiently processing HPSG grammars (the PET sys- tem (Callmeier, 2000)), as well as an architecture for inte- grating deep and shallow natural language processing com- ponents to increase robustness of HPSG grammars (the Heart of Gold (Sch¨ afer, 2007)). Further linguistic resources that are available in the DELPH-IN repository include broad-coverage grammars for English (Flickinger, 2002), German (Crysmann, 2005), and Japanese (Siegel and Ben- der, 2002), as well as smaller grammars for French, Korean (Kim and Yangs, 2003), modern Greek (Kordoni and Neu, 2005), Norwegian (Hellan and Haugereid, 2005), and Por- tuguese (Branco and Costa, 2008). 2 2. Architecture We have developed a hybrid architecture which integrates shallow processing functionalities – morphological analy- sis, and Named Entity (i.e. proper names, dates, numbers, ratios, currency, and physical magnitudes) recognition and classification – into the parsing process. See Figure 1. Before parsing input sentences with the LKB system, raw text is pre-processed by the FreeLing toolkit, an open- source language analysis tool suite performing shallow pro- cessing functionalities (Atserias et al., 2006). 3 Our sys- tem plugs the FreeLing tool into the system by means of the LKB Simple PreProcessor Protocol (SPPP), 4 which as- sumes that a preprocessor runs as an external process to the LKB system. (1) is the output from the SPPP for the input “gato” (cat). (1) <segment> <token form="gato" from="0" to="1"> <analysis stem="gato"> <rule id="NCMS" form="gato"/> </analysis> </token> 2 See http://www.delph-in.net/. 3 The FreeLing toolkit may be downloaded from http://www.lsi.upc.edu/˜nlp/freeling. 4 See http://wiki.delph-in.net/moin/LkbSppp. 700