Automating the Indexing of Images Using Tree Tessellation Algorithm v.10 D. Tsishkou 1 L. Chen 1 1 LIM (Laboratory of Informatics and Multimedia) Ecole Centrale de Lyon 36 Avenue Guy de Collongue, 69131 Ecully Cedex – France {dzmitry.tsishkou, liming.chen}@ec-lyon.fr Abstract This paper describes an experimental study of TTA10 algorithm for indexing and retrieval of large image databases. Our approach consist of primary and secondary tessellation stages, therefore it can be used in computer cluster with parallel multiprocessor architecture. We propose results on compact index structure, which increases systems scalability and achieves logarithmic complexity. Tree structured index in Tree Tessellation Algorithm v.10 (TTA10) is optimized for balance monitoring and real-time sub-tree reorganization, avoiding of changes in the structure of the entire tree. Finally we will describe a practical implementation of our indexing/retrieval system on a database of 122000 images and post results on tree balance, topology effectiveness, complexity and performance. Keywords Indexing, retrieval, complexity, images, database, multimedia, MPEG-7, video. 1 Introduction In recent years, there has been a growing interest in developing effective methods for searching large image databases based on image content. The interest in image search algorithms has grown out of the necessity of managing large image databases the are now commonly available on web and wide area networks [1,2]. Visual information is rich in content [6]. The same picture may invoke different responses from different user, at the different time, and in different contexts. A document may have different meanings at different levels, e.g., description, analysis, and interpretation [1]. With MPEG- 7, an audio/video document is represented by a hierarchical structure both syntactically and semantically [2]. With syntactic decomposition, a document is divided into a hierarchy of segments, known as segment tree. A segment is further divided into video segment and audio segment, corresponding to the video frames (images) and the audio waveform, respectively. A frame can be recursively divided into subregions to form a region tree. Each subregion is described by a set of descriptors. The visual descriptors can be categorized into four groups: color, shape, motion, and texture. This paper concentrates on the images indexing/retrieval algorithm that use MPEG-7 visual descriptors as objects features. Recent research has produced much progress in visual information retrieval. Several systems, such as Virage, QBIC, Photobook, VisualSEEk, VideoQ [3] provide efficient tools for users to specify visual queries using image examples or sketches. Multiple visual features are used in combination. In order to work with large number of widely varying real-word objects taken in natural settings, some of them use a hierarchical structure for index storage. This paper we don't consider hierarchical region segmentation, comparison of visual descriptors and construction of multiple feature based search system. We evaluate TTA10 based solution to index/retrieve images using one visual descriptor represented as a feature vector. The most widely studied methods for speeding up search solution are k trees, k-d trees [5], R trees, tsvq, and other tree structured methods. Hierarchical discriminate analysis for image retrieval is also studied [4]. TTA10 algorithm uses primary and secondary tessellation stages in order to make an efficient tree-structured index. The major characteristics of the effective hierarchical (tree-based) image indexing/retrieving system are following:  Logarithmic complexity, i.e. tree structure is highly balanced in general  Compact index structure, which makes possible to use search system on the mid-end hardware  Paralleled indexing/retrieval algorithm architecture in order to make computer cluster for high-end image databases with billions of samples  Dynamic sub-tree monitoring and reorganization, which allows to add the new images into database in real-time without rebuild of entire tree-structured index  Effective tree-topology structure organization, that is necessary to speed-up the process of nearest neighbors search Current paper is organized as follows :