POWLA: Modeling Linguistic Corpora in OWL/DL Christian Chiarcos Information Sciences Institute, University of Southern California, 4676 Admiralty Way # 1001, Marina del Rey, CA 90292 chiarcos@daad-alumni.de Abstract. This paper describes POWLA, a generic formalism to repre- sent linguistic annotations in an interoperable way by means of OWL/DL. Unlike other approaches in this direction, POWLA is not tied to a spe- cific selection of annotation layers, but it is designed to support any kind of text-oriented annotation. 1 Background Within the last 30 years, the maturation of language technology and the increas- ing importance of corpora in linguistic research produced a growing number of linguistic corpora with increasingly diverse annotations. While the earliest an- notations focused on part-of-speech and syntax annotation, later NLP research included also on semantic, anaphoric and discourse annotations, and with the rise of statistic MT, a large number of parallel corpora became available. In parallel, specialized technologies were developed to represent these annotations, to perform the annotation task, to query and to visualize them. Yet, the tools and representation formalisms applied were often specific to a particular type of annotation, and they offered limited possibilities to combine information from different annotation layers applied to the same piece of text. Such multi-layer corpora became increasingly popular, 1 and, more importantly, they represent a valuable source to study interdependencies between different types of annota- tion. For example, the development of a semantic parser usually takes a syntac- tic analysis as its input, and higher levels of linguistic analysis, e.g., coreference resolution or discourse structure, may take both types of information into con- sideration. Such studies, however, require that all types of annotation applied to a particular document are integrated into a common representation that pro- vides lossless and comfortable access to the linguistic information conveyed in the annotation without requiring too laborious conversion steps in advance. At the moment, state-of-the-art approaches on corpus interoperability build on standoff-XML [5,26] and relational data bases [12,17]. The underlying data models are, however, graph-based, and this paper pursues the idea that RDF and 1 For example, parts of the Penn Treebank [29], originally annotated for parts-of- speech and syntax, were later annotated with nominal semantics, semantic roles, time and event semantics, discourse structure and anaphoric coreference [30]. E. Simperl et al. (Eds.): ESWC 2012, LNCS 7295, pp. 225–239, 2012. c Springer-Verlag Berlin Heidelberg 2012