Clustering Fundamental Spatial n-Grams for Large Scale Cuneiform Search Bartosz Bogacz and Hubert Mara Interdisciplinary Center for Scientific Computing (IWR) Forensic Computational Geometry Laboratory (FCGL) Heidelberg University, Germany {bartosz.bogacz|hubert.mara}@iwr.uni-heidelberg.de Abstract—Documents written in cuneiform script are one of the largest sources about ancient history. The script is written by imprinting wedges (Latin: cunei) into clay tablets and was used for almost four millennia. This three-dimensional script is typically transcribed by hand with ink on paper. These transcriptions are available in large quantities as raster graphics by online sources like the Cuneiform Database Library Initative (CDLI). Virtually all cuneiform databases cannot be searched graphically using a cuneiform character as a query. We present a framework for a large-scale and segmentation-free search of cuneiform characters. We build upon our previous work, i.e. extracting features from cuneiform, to cluster constellations of wedges. We describe cuneiform tablets in terms of spatial n- grams to efficiently query which locations in a tablet contain all n-grams of a query. These locations are used to perform an exact matching. We provide preliminary results in form of exemplary query results to show the viability of our method. I. I NTRODUCTION Documents were written in cuneiform script for more than three millenia in the ancient Middle East [1]. Cuneiform characters were typically written on clay tablets by imprinting a rectangular stylus and leaving a wedge (cuneus in Latin) shaped trace, i.e., triangular markings. As clay was always cheaply and easily available, everybody capable of writing could produce robust documents. Therefore, the content of cuneiform tablets ranges from simple shopping lists to treaties between empires. The Cuneiform Digital Library Initiative [2] incorporates a number of projects aimed at cataloging cuneiform documents and making them available online as tracing, 2D image and sometimes as transliteration. Currently, there are few rudi- mentary approaches to search such databases for cuneiform characters graphically, that is, searching for characters not yet transliterated. Figure 1 shows a tracing of a cuneiform tablet, where repeating patterns of wedges are marked. Our main contribution in this work is a framework to decompose cuneiform tablets and identify repeating geometric patterns that are used for a large-scale and segmentation-free search. Rothacker et al. [3] approach the task of spotting cuneiform by transforming the tablets into a raster representation and then using their word-spotting framework [4]. The transformation of the 3D data into a raster graphic is done by computing features based on the local gradient of the mesh similar to the work of Mara et al. [5]. Fig. 1. Retrieval results of an example query. The query has a blue bounding box, the spotted cuneiform characters have a orange bounding box. Unlike the approach of Rothacker, our method does not require an example query from the document. We tailor our approach, by representing tablets as sets of basic patterns, to search a database of thousands of cuneiform tablets. Leydier et al. [6] present an approach to word spotting that decomposes words into zones of interest of various sizes and position. Matching words is performed by aligning a set of zones for a query word to the zones detected in the target document. Then, the zones of interest themselves are compared. Instead of using premeditated rules for decomposition, our method extracts and learns repeating geometric patterns in cuneiform. Additionally, our alignment method avoids an exhaustive search by using a nearest neighbor query to locate regions containing all required patterns. II. PATTERN CLUSTERING We represent extracted cuneiform constellations as sets of feature vectors, where each wedge is described by an 12- dimensional vector [7]. These feature vector can be directly compared by computing the Euclidean distance. We compare constellations by computing an optimal assignment of wedges to wedges [8]. We represent cuneiform constellations as a finite set of search-able patterns. For these patterns, we introduce the concept of spatial n-grams. Similar to their counterparts in