Links and Acknowledgements Links QR-Code: Sample Homepage based on Jekyll Acknowledgements This workfow was developed with contributions from: • Doris Prechel (JGU Mainz) • Tim Brandes (JGU Mainz) • Ali Zalaghi (JGU Mainz) • Kai Christian Bruhn (HS Mainz) References [1] Hubert Mara, Susanne Krömker, Stefan Jakob, and Bernd Breuckmann. Gigamesh and gilga- mesh:–3d multiscale integral invari- ant cuneiform character extraction. In Proceedings of the 11th Internati- onal conference on Virtual Reality, Archaeology and Cultural Heritage, pages 131–138. Eurographics Asso- ciation, 2010. (document) [2] Timo Homburg. Paleo codage - a machine-readable way to describe cuneiform characters paleographically. DH 2019, July 2019. (document) [3] Timo Homburg and Christian Chiarcos. Word segmen- tation for akkadian cuneiform. In LREC 2016, 2016. (document) [4] Timo Homburg. Postag- ging and semantic dictionary cre- ation for hittite cuneiform. In Book of Abstracts of DH2017, Montréal, Canada, August 2017. Alliance of Digital Humanities Organizations. (document) [5] John McCrae, Dennis Spohr, and Philipp Cimiano. Linking lexical resources and ontologies on the semantic web with lemon. In Extended Semantic Web Conferen- ce, pages 245–259. Springer, 2011. [6] Timo Homburg, Christi- an Chiarcos, Thomas Richter, and Dirk Wicke. Learning cuneiform the modern way. https://gams.uni-graz. at/o:dhd2015.p.55, 2015. Poster ID: 114 Paper ID: 1204 Towards Creating A Best Practice Digital Processing Pipeline For Cuneiform Languages Timo Homburg, M. Sc. timo.homburg@hs-mainz.de Hochschule Mainz Introduction Cuneiform languages became a more recent topic to be investigated in the digital humanties and in the computa- tional liguistics [1,6]. Several aspects of the processing of cuneiform languages like 3D scanning of cuneiform tablets [1], paleo-graphic description [2], word segmentation [3], part of speech tagging [4] with and without machine lear- ning approaches have been investigated. In addition, seve- ral standards to represent cuneiform languages like TEI/ XML, ATF* and RDF exist each with their distinct advanta- gesand disadvantages. While the current representations are useful for the respective research communities (e.g. ATF for cuneiform scholars), a best practice way to pro- duce all the required formats from a cuneiform clay tablet to semantic web data sources which are useful for a vari- ety of communities is currently missing. On this poster we present our take to produce such a cuneiform processing pipeline which we intendto implement in an upcoming re- search project which processes cuneiform clay tablets of Haft Tappeh in Iran. * http://oracc.museum.upenn.edu/doc/help/editinginatf/cdliatf/index.html Cuneiform Tablet Result: Tablet List 3D Scan Result: Annotated 3D Scans Transliteration and Paleography Result: ATF with metadata Annotation and Enrichment Result: TEI/XML with linguistic/ semantic annotations Linked Data Creation and Publishing Result: Linked Data Applications and Analysis Result: Views and domain-specifc results Digital Processing Pipeline - From Cuneiform Tablet to X Pipeline described • Cuneiform Tablet: Cuneiform tablets are assigned ids and metadata to be processed in a further step. • 3D Scan: Cuneiform tablets are 3D scanned and a subset of tablets is annotated • Transliteration and Paleography: Cuneiform tablets are transcribed along with meta- data and published as ATF. Paleographic diferen- ces are described using PaleoCodage [2] as distinct transliterations • Annotation and Enrichment: ATF transliterations are converted to TEI/XML and enriched with semantic and linguistic annotations • Linked Data Creation and Publishing: Annotated TEI/XML documents are converted to RDF in order to create dictionary resources, sign lists • Applications and Analysis: Linked Data is used as a basis to describe statistics about the texts being analyzed and to produce further outputs which may be useful in other applica- tion contexts Pipeline visualized Motivation and Workflow The setup of a digital editing pipeline by defning and documenting workfows as well as establishing a technically lean environment is based on existing solutions. It takes a consistently datacentric digital approach and envisages investing the computational resources to develop components that are closely oriented to the requirements of philology and oriented to other international activities in this feld. To achieve this goal, the whole workfow is created in a Git-based environment, so that the process can be easily replicated to be used in other projects. Three intermediate results Sample Project Workflow 1. Basic edition (ATF Format): Cuneiform tablets are transcribed into ATF: The standard format for cuneiform tablet documentation and commentary, 2. Enhanced Edition (TEI/XML): ATF representations are automatically converted into a TEI/XML template. Using a TEI/XML Editor, semantic and linguistic annotations can now be applied. 3. Extensive Edition (TTL/RDF): Annotated TEI/XML is converted to RDF and inserted into a triple store to provide access for the linked data community Figure 1: Proposed Workfow of the Haft Tappeh Project #atf: lang sux @tablet @obverse Annotations and Ontology Model We use existing dictionary resources for Akkadian/ Hittite/Sumerian cuneiform and convert them accor- ding to the Lexicon Model for Dictionaries (Lemon) [5] standard to include: Which information is annotated/generated? • Semantic Concepts from e.g. Wikidata • POSTags (Gender, Time, Person, Case) to anno- tate cuneiform texts • Etymology, Paleography and Metadata (Dynasty, Dialect, Place, etc.) • Generated: Text statistics, Cuneiform Fonts, In- put Methods, Machine Learning data (linguistics, image recognition) How does the annotation work? Figure 2: Using CWRCWriter which we extend by a linguistic annotation component and cuneiform TEI/XML templates