Assessing Sparse Coding Methods for Contextual Shape Indexing of Maya Hieroglyphs Edgar Roman-Rangel, Jean-Marc Odobez, Daniel Gatica-Perez Idiap Research Institute, Martigny, Switzerland ´ Ecole Polytechnique F´ ed´ erale de Lausanne (EPFL), Switzerland Email: {eroman, odobez, gatica}@idiap.ch Abstract— Bag-of-visual-words or bag-of-visterms (bov) is a common technique used to index Multimedia information with the purposes of retrieval and classification. In this work we address the problem of constructing efficient bov representations of complex shapes as are the Maya syllabic hieroglyphs. Based on retrieval experiments, we assess and evaluate the performance of several variants of the recent sparse coding method KSVD, and compare it with the traditional k-means clustering algorithm. We investigate the effects of a thresholding procedure used to facilitate the sparse decomposition of signals that are potentially sparse, and we also assess the performance of different pooling techniques to construct bov representations. Although the bov’s computed via Sparse Coding do not outperform the retrieval precision of those computed by k-means, they achieve competitive results after an adequate enforcement of the sparsity, which leads to more discriminative bag representations with respect to using the original non-sparse descriptors. Also, we propose a simplified formulation of the HOOSC descriptor that improves the retrieval performance. Index Terms— indexing, clustering, sparse coding, shape descriptor, Maya culture, hieroglyph. I. I NTRODUCTION The collection of digital imagery has been boosted in the last years by a whole new generation of devices that allow to gather thousands of high quality images, therefore generating the need for efficient tools to index large image data sets and to retrieve images that are similar to a given query in terms of visual content. This phenomenon is widely spread in different fields, such as photography, painting, the arts, and archaeology. One instance of the above mentioned phenomenon is the AJIMAYA project (Hieroglyphic and Iconographic Maya Heritage) conducted by the National Institute of Anthropology and History of Mexico (INAH). Despite the success of the project towards gathering a collection of images of all existing monuments in some of the archaeological Maya sites within the Mexican territory, the manual cataloging of the hieroglyphs remains to be accomplished, mainly due to the large amount of infor- mation that has been generated, and the lack of automatic and semiautomatic tools to support the cataloging goal. For instance, Fig. 1 shows a Maya inscription with a large amount of hieroglyphs. The Maya writing system is composed of two main types of hieroglyphs: logograms (words) and syllabo- grams (syllables), and the blocks found in inscriptions Figure 1. Maya inscription found in a lintel in Yaxchilan. The inscription is rich in hieroglyphs which are cataloged manually. © AJIMAYA. usually exhibit one or two logograms accompanied by one to four syllabograms complementing each other to build coherent sentences, Fig. 2(a) shows four blocks vertically arranged, each of them contains both syllabograms and logograms. A third type of Maya glyphs that correspond to Maya art is known as iconography, e.g., Fig. 2(b). In our work we focus on the description and retrieval of Maya syllabograms. Currently, a rough estimate of 1000 different hiero- glyphs have been discovered, from which only almost 80% of them have been deciphered. The other 20% re- mains unknown, and archaeologists continue finding new hieroglyphs that require to be identified and classified. In this paper, we present recent advancements made towards the design of an efficient content-based retrieval engine for epigraphic versions of Maya hieroglyphs. We conducted a systematic study to assess the quality of recently proposed techniques to represent and retrieve im- ages. More specifically, of bag-of-visterms representations constructed based on two indexing techniques: the KSVD algorithm, which is a recent method for sparse coding [1], and the traditional k-means clustering [2]. According to [3] sparse coding is a method to rep- resent signals as sparse linear combinations of an over- complete set of basis functions called dictionary. The method is inspired on research work by the neuroscience community, which suggests that the receptive field on JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, APRIL 2012 179 © 2012 ACADEMY PUBLISHER doi:10.4304/jmm.7.2.179-192