AUDIO-VISUAL CONTENT-BASED MULTIMEDIA INDEXING AND RETRIEVAL – THE MUVIS FRAMEWORK Moncef Gabbouj and Serkan Kiranyaz Institute of Signal Processing Tampere University of Technology Tampere, Finland moncef.gabbouj@tut.fi ABSTRACT MUVIS is a collaborative framework that supports indexing, browsing and querying of various multimedia types such as audio, video, audio/video interlaced in several formats. It allows real-time audio and video capturing, encoding by last generation codecs such as MPEG-4, H.263+, MP3 and AAC. MUVIS also supports several audio/video file format such as AVI, MP4, MP3 and AAC. MUVIS achieves a global and unified solution for content-based indexing and retrieval problem and provides user-friendly applications and a generic framework especially for third parties to develop their feature extraction modules. In this paper, we present an overview of the MUVIS system and we shall especially focus on the overall audio-based multimedia indexing and retrieval scheme within MUVIS framework. 1. INTRODUCTION The growth in the size of available multimedia both audio and visual requires proper management, indexing and retrieval schemes. In order to overcome such problems several content- based indexing and retrieval techniques and applications have been developed such as MUVIS system [1], [2], [13], Photobook [3], VisualSeek [4], Virage [5], VideoQ [6] and VideoAL [15]. The common feature of all such systems is that they all provide some kind of framework and several techniques for indexing and retrieving either still images or audio-video files. MPEG-7 [7] is a recent standard for multimedia content description. We have recently developed a PC-based MUVIS system, which is further capable of content-based indexing and retrieval of video and audio information in addition to several image types. Table 1 shows the types of multimedia that the MUVIS system so far supports. MUVIS Audio Codecs Sampling Freq. Channel No File Formats MP3 [11] 16, 22.050, 24 KHz Mono MP3 AAC [12] 32, 44.1 KHz Stereo AAC G721 Any for G721, AVI G723 G723 & PCM MP4 PCM MUVIS Video Codecs Frame Rate Frame Size File Formats H263+ [10] 1 - 25 fps Any AVI MPEG-4 [9] MP4 YUV 4:2:0 RGB 24 MUVIS Image Types Convertible Formats JPEG JPEG 2K BMP TIFF PNG Inconvertible Formats PCX GIF PCT TGA PCX EPS WMF PGM Table 1: MUVIS Multimedia Family The current version of the MUVIS framework supports the following multimedia processing capabilities and features: • Real-time audio and video capturing, encoding and recording, • Hierarchic video handling and representation, • Video summarization via scene detection [8], • An effective framework structure, which provides an application independent basis in order to develop audio and visual feature extraction techniques that are used dynamically by MUVIS applications for indexing and retrieval. • The retrieval based on distinct visual and aural queries initiated from any MUVIS database that includes audio/video clips and still images. • Conversion of alien formats to those supported in MUVIS, The rest of this paper is organized as follows: in section 2, we outline the system philosophy of MUVIS and the general MUVIS framework with underlying applications. Section 3 presents the overall audio-based indexing and retrieval scheme in the MUVIS framework. In section 4, we demonstrate some experimental results on audio–based multimedia retrieval via query and conclude the paper.