A Discourse Information Radio News Database for Linguistic Analysis Kerstin Eckart, Arndt Riester, and Katrin Schweitzer Abstract In this paper we present DIRNDL, an annotated corpus resource com- prising syntactic annotations as well as information status labels and prosodic in- formation. We introduce each annotation layer and then focus on the linking of the data in a standoff approach. The corpus is based on data from radio news broad- casts, i.e. two sets of primary data: spoken radio news files and a written text ver- sion which sometimes deviates from the actual spoken data. We utilize a generic relational database management system to bridge the gap between the deviating pri- mary data as well as between the different properties of the annotation levels. We show how the resource can support data extraction concerning the interface between information status, syntax and prosody. 1 Introduction We present the DIRNDL corpus (D iscourse I nformation R adio N ews D atabase for L inguistic analysis), an annotated resource of news broadcasts from Deutschland- funk, a German radio station, prepared for the investigation of the interfaces between prosody, information status and syntax. 1 The database contains audio files (approx. 5 hours of speech; 9 speakers: 5m, 4f), which were annotated for pitch accents and prosodic boundaries following GToBI(S) (Mayer, 1995). Furthermore, it comprises a treebank based on the written manuscripts of the news (3221 sentences), which were annotated for referential information status (given-new distinction), according to Riester et al (2010). The two types of data are aligned in a generic relational database management system described in Eckart et al (2010). Kerstin Eckart · Arndt Riester · Katrin Schweitzer Universit¨ at Stuttgart, Institut ur Maschinelle Sprachverarbeitung, Azenbergstr. 12, 70174 Stuttgart e-mail: \{eckartkn,arndt.riester,katrin-schweitzer\}@ims. uni-stuttgart.de 1 News broadcasts from 25-27/03/2007; downloaded from http://www.dradio.de. 1