Assessing Sparse Coding Methods for
Contextual Shape Indexing of Maya Hieroglyphs
Edgar Roman-Rangel, Jean-Marc Odobez, Daniel Gatica-Perez
Idiap Research Institute, Martigny, Switzerland
´
Ecole Polytechnique F´ ed´ erale de Lausanne (EPFL), Switzerland
Email: {eroman, odobez, gatica}@idiap.ch
Abstract— Bag-of-visual-words or bag-of-visterms (bov) is a
common technique used to index Multimedia information
with the purposes of retrieval and classification. In this
work we address the problem of constructing efficient bov
representations of complex shapes as are the Maya syllabic
hieroglyphs. Based on retrieval experiments, we assess and
evaluate the performance of several variants of the recent
sparse coding method KSVD, and compare it with the
traditional k-means clustering algorithm. We investigate the
effects of a thresholding procedure used to facilitate the
sparse decomposition of signals that are potentially sparse,
and we also assess the performance of different pooling
techniques to construct bov representations. Although the
bov’s computed via Sparse Coding do not outperform the
retrieval precision of those computed by k-means, they
achieve competitive results after an adequate enforcement
of the sparsity, which leads to more discriminative bag
representations with respect to using the original non-sparse
descriptors. Also, we propose a simplified formulation of the
HOOSC descriptor that improves the retrieval performance.
Index Terms— indexing, clustering, sparse coding, shape
descriptor, Maya culture, hieroglyph.
I. I NTRODUCTION
The collection of digital imagery has been boosted
in the last years by a whole new generation of devices
that allow to gather thousands of high quality images,
therefore generating the need for efficient tools to index
large image data sets and to retrieve images that are
similar to a given query in terms of visual content. This
phenomenon is widely spread in different fields, such as
photography, painting, the arts, and archaeology.
One instance of the above mentioned phenomenon is
the AJIMAYA project (Hieroglyphic and Iconographic
Maya Heritage) conducted by the National Institute of
Anthropology and History of Mexico (INAH). Despite
the success of the project towards gathering a collection
of images of all existing monuments in some of the
archaeological Maya sites within the Mexican territory,
the manual cataloging of the hieroglyphs remains to be
accomplished, mainly due to the large amount of infor-
mation that has been generated, and the lack of automatic
and semiautomatic tools to support the cataloging goal.
For instance, Fig. 1 shows a Maya inscription with a large
amount of hieroglyphs.
The Maya writing system is composed of two main
types of hieroglyphs: logograms (words) and syllabo-
grams (syllables), and the blocks found in inscriptions
Figure 1. Maya inscription found in a lintel in Yaxchilan. The inscription
is rich in hieroglyphs which are cataloged manually. © AJIMAYA.
usually exhibit one or two logograms accompanied by one
to four syllabograms complementing each other to build
coherent sentences, Fig. 2(a) shows four blocks vertically
arranged, each of them contains both syllabograms and
logograms. A third type of Maya glyphs that correspond
to Maya art is known as iconography, e.g., Fig. 2(b). In
our work we focus on the description and retrieval of
Maya syllabograms.
Currently, a rough estimate of 1000 different hiero-
glyphs have been discovered, from which only almost
80% of them have been deciphered. The other 20% re-
mains unknown, and archaeologists continue finding new
hieroglyphs that require to be identified and classified.
In this paper, we present recent advancements made
towards the design of an efficient content-based retrieval
engine for epigraphic versions of Maya hieroglyphs. We
conducted a systematic study to assess the quality of
recently proposed techniques to represent and retrieve im-
ages. More specifically, of bag-of-visterms representations
constructed based on two indexing techniques: the KSVD
algorithm, which is a recent method for sparse coding [1],
and the traditional k-means clustering [2].
According to [3] sparse coding is a method to rep-
resent signals as sparse linear combinations of an over-
complete set of basis functions called dictionary. The
method is inspired on research work by the neuroscience
community, which suggests that the receptive field on
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, APRIL 2012 179
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.2.179-192