SALTO – A Versatile Multi-Level Annotation Tool Aljoscha Burchardt, Katrin Erk, Anette Frank * , Andrea Kowalski, and Sebastian Pado Dept. of Computational Linguistics Saarland University * and DFKI Saarbr¨ ucken, Germany {albu, erk, frank, kowalski, pado}@coli.uni-sb.de Abstract In this paper, we describe the SALTO tool. It was originally developed for the annotation of semantic roles in the frame semantics paradigm, but can be used for graphical annotation of treebanks with general relational information in a simple drag-and-drop fashion. The tool additionally supports corpus management and quality control. 1. Introduction We present SALTO, a tool for manual annotation within an intuitive, easy to use graphical environment. Its purpose is to support the annotation of a second structural layer on top of an existing syntactic structure. Originally developed for the annotation of semantic roles and semantic classes in the FrameNet paradigm (Baker et al., 1998), it can be used for related tasks, such as annotation of discourse structure or anaphoric relations. The key features of SALTO include: Query-based selection of data sets for annotation. Definition of tag sets for the annotation. Distribution of corpora to annotators. Comfortable annotation with visual editor and mouse- menus. Quality control: inspection and correction of disagree- ments between annotators. Many annotation tools, e.g. MMAX (M¨ uller and Strube, 2001), work with text-based representation and thus have to resort to bracketing to represent more complex structure. SALTO, like the Annotate tool (Brants and Plaehn, 2000), represents syntactic structure graphically, as shown in Fig- ure 1. But while Annotate supports the graphical annota- tion of plain text with syntactic structure, SALTO displays a fixed syntactic structure and allows the annotation of a second layer of structure on top of the first one, with the second layer referring to arbitrary nodes of the first layer. This paper is structured as follows: Section 2 characterizes the kind of annotation tasks that SALTO can be used for, both theoretically and via two walk-through examples. In Section 3 we list the most important features that SALTO offers to support annotation. Section 4 describes the overall workflow that SALTO presupposes and supports as well as the quality control mode of the tool. Section 5 contains details on obtaining the SALTO tool. 2. SALTO: Annotation on Top of Syntax In this section, we describe the type of annotation tasks that SALTO supports. First, we characterize the types of an- notation which SALTO can be used for, including assump- tions about input data; then, we provide detailed examples for two different example tasks. 2.1. Annotation Tasks that SALTO Supports SALTO offers a graphical environment for linguistic anno- tation. The tool assumes that input corpora are syntactically annotated, then adds a second layer of structure, which can refer to arbitrary nodes in the syntactic structure. SALTO supports any annotation task which can be phrased in terms of one or more trees, as long as each tree can be anchored at some overt expression in the sentence. SALTO accepts input in TIGER XML (Mengel and Lez- ius, 2000) as well as its own output format, SALSA/TIGER XML (Erk and Pado, 2004). TIGER XML conceptualizes syntactic structure as a directed graph. It is capable of de- scribing constituents as well as dependency structure and flexible enough to handle discontinuous constituents. Transformation from many treebank formats to TIGER XML is available via TIGERRegistry, a component of TIGERSearch (Lezius, 2002). SALTO can also handle “pseudo”-analyses of unparsed sentences, consisting only of a sentence node and the terminals, so that annotation of data without syntactic analysis is easily possible as well. 2.2. Example 1: Semantic Role Annotation SALTO was originally developed for the manual annota- tion of semantic roles in the context of the SALSA project 1 (Erk et al., 2003), which aims at annotating a large Ger- man corpus with role-semantic information in the Berkeley FrameNet (Baker et al., 1998) paradigm. The FrameNet resource associates words and expressions with semantic classes called frames and lists semantic roles, called frame elements, for each semantic class. Figure 1 shows a screenshot of SALTO, displaying a sen- tence drawn from the TIGER corpus (Brants et al., 2002) and annotated with two frames: He bought the wine bar in order to close it. Syntactic structure. The syntactic structure of the sen- tence in Figure 1 is shown as a tree with straight edges. The node labels (shown as dark circles) give the syntactic cate- gories of constituents. Edge labels describing dependency relations can optionally be displayed but are disabled by default to avoid cluttering the picture. 1 www.coli.uni-sb.de/projects/salsa , funded by the German Science Foundation DFG, Title PI 154/9-2.