A Multidimensional Approach in Content-based Multimedia Information Retrieval System Indra Budi, Zainal A. Hasibuan, Gema P. Mindara Faculty of Computer Science University of Indonesia Depok, Indonesia indra@cs.ui.ac.id, zhasibua@cs.ui.ac.id, gema.parasti@ui.ac.id Albaar Rubhasy Department of Computer System STMIK Indonesia Jakarta, Indonesia albaar.rubhasy@stmik-indonesia.ac.id Abstract— In this digital era, the use of digital multimedia information is highly utilized and growing very rapidly due to the development of the Internet. Thus, users demand for more effective content-based multimedia information retrieval system (CBMMIRS). The major challenge in this research area is that a multimedia document comprises more than one type of contents (i.e. text, image, audio). In order to address this challenge, many works have been focusing on the indexing techniques development which can accommodate multiple multimedia object representation or known as object features. However, most of the experiments use only one certain kind of collection, for example a collection of WWW pages, video collections, image collections, and so forth. In this paper, we propose a multidimensional approach which could accommodates semantic indexing of various multimedia contents in different multimedia collections, since the fact is that different multimedia documents may share similar information. The architecture comprises three components: (1) collection manager (which manages multimedia documents repository); (2) indexer (which handles multimedia concept detection and indexing); and (3) query processor (which deals with query and search results). Our hypothesis is that the more complete the document (which indexed in many different feature spaces), the more relevant the document and should be ranked higher in the search results. Keywords- CBMMIRS, multimedia information retrieval, multidimensional approach I. INTRODUCTION With the development of the Internet, the use of digital multimedia information (including audio, video, images and graphics) is growing rapidly and has plays an important role in modern life. Most of the multimedia files were published and distributed in various formats via the social media within the Internet for instance Facebook 1 , Flickr 2 , Youtube 3 , and so forth. As a result, there is an explosion of digital multimedia objects and users demand for more efficient yet accurate content-based multimedia information retrieval system (CBMMIRS). Due to the large and varied digital multimedia collection, a text-based retrieval system is considered to be inefficient 1 http:// www.facebook.com 2 http:// www.flickr.com 3 http://www.youtube.com considering the level of human labor and the precision level. Therefore, in the early 1980s, content-based information retrieval (CBIR) was introduced to overcome the disadvantages. However, by nature a multimedia document may consist of more than one type of content, for example text, images, video and audio. Thus, in late 1990s, emerged a novel approach which combines the text-based and content- based retrieval method in order to boost CBMMIRS performance. Many authors describe such technique as a multimodal information retrieval whilst the system indexes and retrieves using various object representation/modalities, such as text, color, texture, etc. Nevertheless, in many papers, authors used only one type of multimedia collection, such as TRECVID for video collection [1], MIRFLICKR for image collection [2], WIKIPEDIA-MM for world wide web pages (WWW) collection [3], and so forth. In this paper, we propose a multidimensional approach which accommodates the heterogeneous kind of the multimedia collections and the variety of multimedia contents (i.e. textual, visual, and audio). The goal of this approach is to achieve the completeness of information, means that the most relevant information must be available in many type of contents. Even though this approach might be fruitful, but there exist a constraint in context of applying a number of objects features. In this case, excessive use of object features in indexing may lead into a poor performance, due to the famous ‘curse of dimensionality’ problem [4]. As the dimensionality of feature space increases, the performance of indexing algorithms will degrades. Research showed that when the dimensionality is above 10, the performance is no better than a simple sequential scan [5]. This paper explores a multidimensional approach in CBMMIRS. The rest of the paper is organized as follows. In Section 2, we show some works related to this paper. Section 3 focuses on the multidimensional approach in CBMMIRS using high dimension of feature spaces with various type of collections. Section 4 concludes this paper and in this section we also discuss the future works that will be conduct.