REQUIREMENTS-DRIVEN AUTOMATIC CONFIGURATION OF NATURAL LANGUAGE APPLICATIONS Dan Cristea, Corina Forăscu, Ionuţ Pistol University “Al. I. Cuza” of Iaşi, Faculty of Computer Science dcristea@infoiasi.ro , corinfor@infoiasi.ro , ipistol@infoiasi.ro Keywords: tools and resources, natural language processing, annotation schemes Abstract: The paper proposes a model for dynamical building of architectures intended to process natural language. The representation that stays at the base of the model is a hierarchy of XML annotation schemas in which the parent-child links are defined by subsumption relations. We show how the hierarchy may be augmented with processing power by marking the edges with names of processors, each realising an elementary NL processing step, able to transform the annotation corresponding to the parent node onto that corresponding to the child node. The paper describes a navigation algorithm in the hierarchy, which computes paths linking a start node to a destination node, and which automatically configures architectures of serial and parallel combinations of processors. 1 INTRODUCTION In this paper we propose a methodology that allows for the automatic configuration of architectures of serial and parallel combinations of natural language (NL) processors, each able to perform an elementary transformation to an input file. The input and output of the modules (between the processing steps) are XML annotated files. GATE (Cunningham et al., 2002, 2003) is an extremely versatile environment for building and deploying NLP software and resources. It allows for the integration of a large amount of built-ins in new processing pipelines that can be put to work on single documents or corpora. In order to build a pipeline the user is instructed to select the modules (called resources in GATE) needed as parts of the processing chain, in the correct processing order and to instantiate their parameters. When all these are done, the configured chain of processes may be put to work on an input file, with the result of obtaining an output file, XML annotated. The model we propose comprises a combination of processing steps and filtering steps. The processing steps add information while filtering steps remove information. Our approach is based on Cristea and Butnariu’s (2004) hierarchy of annotation schemas. In this model, XML annotation schemas are nodes in a directed acyclic graph, and the hierarchical links are subsumption relations between schemas. The model allows classification, simplification and merging operations to be performed on files observing the restrictions of the annotation schemas that are comprised in the hierarchy. We describe how the graph may be augmented with processing power by marking edges linking parent nodes to daughter nodes with names of processors, each realising an elementary NL processing step. On the augmented graph, three operations are defined: simplification, pipeline and merge. We present then a navigation algorithm in this hierarchy, which computes paths between a start node, corresponding to an input file, and a destination node corresponding to an output file. To these computed paths correspond sequences of operations, which are equivalent to architectures of serial and parallel combinations of processors. When an input file is given to a system that implements these principles, and the requirements of an output annotation are specified as the destination node, first the XML annotation schema of the input file is determined, then this schema is classified onto the hierarchy, becoming the start node, then the expression of operations corresponding to the minimum paths linking the start node to the destination node is computed (the architecture), and finally the input file is given to this architecture, resulting in the expected output file. Section 2 of the paper reviews the hierarchical model of annotation schemas, while section 3 presents the hierarchy augmented with processing power. In section 4, the operations associated to the