Visual Media Retrieval Using Transform-Based Layered Query Scheme Esin Guldogan and Moncef Gabbouj Institute of Signal Processing, Tampere University of Technology, Tampere, Finland E-mail: esin.guldogan@tut.fi, moncef.gabbouj@tut.fi Olcay Guldogan Nokia Technology Platforms, Tampere, Finland E-mail: olcay.guldogan@nokia.com Abstract—This paper presents a visual media querying scheme referred to as Transform-Based Layered Query (TLQ) Scheme. The TLQ scheme mainly aims at decreasing retrieval processing time and run-time memory consumption without degrading retrieval results semantically. The scheme contains abstract layers in indexing and retrieval phases, where each indexing layer corresponds to a retrieval layer. The layers are constructed based on transformations for reducing visual frame and feature data dimensions. The proposed TLQ scheme also involves an unsupervised method for eliminating irrelevant media items between the retrieval layers. A two-layer TLQ system is implemented and integrated into MUVIS content-based multimedia indexing and retrieval framework, and its theoretical advantages are verified with dedicated experiments on image and video databases. The experiments reveal that 75% retrieval performance improvement in terms of process time can be achieved depending on transformation parameters. Keywords—content-based indexing and retrieval; retrieval optimization; query system. I. INTRODUCTION Recent technology improvements along with the Internet growth have led to huge amount of digital multimedia during the recent decades. Various methods, algorithms and systems have been proposed addressing multimedia storage and management problems. Such studies revealed the indexing and retrieval concepts, which have further evolved to Content- Based Multimedia Indexing and Retrieval (CBMIR) [1], [2], [3]. Despite various successful systems, there is no perfect global solution for CBMIR in general. CBMIR systems often analyze multimedia content via so- called low-level features for indexing and retrieval, such as color, texture and shape. Recent systems intend to combine low and high-level features for achieving significantly higher semantic performance. However, considering such combinations makes retrieval more complex and time- consuming process. Additionally, feature extraction processing time and memory requirements are becoming more important problems. Due to high memory and processing power requirements, CBMIR has not been widely used on limited platforms, such as mobile devices or distributed systems. Nevertheless, the usage of CBMIR systems on these platforms is becoming widespread. Hence, the performance optimization of indexing and retrieval plays an important role in practical CBMIR studies. Retrieval performance optimization is more visible for the end-user of a CBMIR system, although indexing affects retrieval directly. Query performance optimization during retrieval consists of three main groups of problems: • Processing time and computational complexity, • Disk and run-time memory space requirements, and • Semantic retrieval performance. Transform-Based Layered Query (TLQ) System is a new visual multimedia querying scheme for increasing query performance without degrading semantic performance. TLQ is further described in Section 2. A sample TLQ system implementation integrated into MUVIS [1] content-based multimedia indexing and retrieval framework is presented in Section 3. The theoretical benefits of the implemented system and its experimental results are also given in Section 3. Finally Section 4 presents the concluding remarks and discussions. II. TRANSFORM BASED LAYERED QUERY (TLQ) SCHEME A. TLQ System Structure Transform-Based Layered Query (TLQ) is a querying system for multimedia databases that are indexed so-called indexing/querying layers. It mainly aims at reducing retrieval processing complexity, time and memory consumption. As shown in the transformation scheme illustrated by Figure 1, the concerning layers are constructed based on three transforms: T1, T2 and T3. T1 represents an optional transformation working on visual media, where T3 represents a similar optional transformation working only on video data. T2 represents a compulsory transformation working on feature data. Although TLQ system does not directly depend on any specific transformations, underlying framework and transformations should follow the assumptions and restrictions below for achieving overall system targets: - Indexing process and feature extraction depends on frame size in terms of time, memory usage and complexity. - Video indexing process also depends on video key- frames in terms of time, memory usage and complexity. - Query process depends on feature data size in terms of time, memory usage and complexity. 0-7803-9134-9/05/$20.00 ©2005 IEEE