Multimedia Systems (2007) 12:533–550
DOI 10.1007/s00530-006-0070-9
REGULAR PAPER
MKL-tree: an index structure for high-dimensional vector spaces
Annalisa Franco · Alessandra Lumini · Dario Maio
Published online: 9 November 2006
© Springer-Verlag 2006
Abstract In this work, a novel hierarchical data
structure for high dimensional data indexing is pro-
posed. MKL-tree is based on dimensionality reduction
operated by means of the MKL transform, a multi-space
generalization of the KL transform. A local dimension-
ality reduction is performed at each node of the tree,
allowing more selective features to be extracted and
thus increasing the discriminating power of the index.
The mathematical foundation for nodes and leaves rep-
resentation and for the techniques aimed to manage
the structure is detailed. Moreover, the algorithms for
bulk loading MKL-tree (i.e., for creating the tree given
a large number of objects simultaneously), for updating
and splitting nodes after the insertion of new objects
and for performing similarity searches are described.
Results are reported for the comparison of MKL-tree
with other well-known access methods in terms of I/O
and CPU costs and precision of the result in the execu-
tion of similarity queries.
Keywords High-dimensional data · Index structures ·
Similarity search · Dimensionality reduction
A. Franco (B ) · A. Lumini
Corso di Laurea in Scienze dell’Informazione,
Università di Bologna, via Sacchi 3,
47023, Cesena, Italy
e-mail: franco@csr.unibo.it
A. Lumini
e-mail: lumini@csr.unibo.it
D. Maio
DEIS - CSITE-CNR - Università di Bologna,
viale Risorgimento 2, 40136, Bologna, Italy
e-mail: dmaio@deis.unibo.it
1 Introduction
Similarity search in multidimensional databases is a
problem widely discussed in the literature [9, 34] and
a variety of data structures [6, 20, 37] for indexing vec-
tor spaces has been proposed, where objects are usually
represented as feature vectors belonging to high-dimen-
sional spaces and are searched by similarity according
to a given example. Many of these structures work well
in low up to medium dimensionality but, as a conse-
quence of the phenomenon known as “dimensionality
curse” [3], they are often outperformed by a simple lin-
ear scan, for dimensionality above 20–30.
This problem is usually dealt with by applying a
dimensionality reduction technique: the data to be
indexed are first reduced to a lower dimensionality by
means of the Karhunen–Loève (KL) transform [19, 23]
and then indexed with a traditional data structure. This
approach, usually referred to as global dimensionality
reduction (GDR), works well when the dataset is sta-
tic and globally correlated. This assumption does not
usually hold in real applications, where GDR produces
an excessive loss of information and, as a consequence,
poor query performance. Recently, new techniques have
been proposed to deal with these problems: a novel
method for performing SVD-based dimensionality
reduction in dynamic databases [25] and a local reduc-
tion technique, named local dimensionality reduction
(LDR) [13]. LDR consists of an indexing structure based
on data partitioning in locally correlated subsets, each of
which is projected into the KL subspace associated to its
elements and indexed independently of each other by
a traditional structure (Hybrid-tree [14] is suggested).
LDR outperforms GDR for locally correlated datasets;
however, it requires a representative set of data to be