A Unified Model for Multimedia Retrieval by Content Punpiti Piamsa-nga Nikitas A. Alexandridis George Blankenship EECS, GWU, Washington DC, 20052 G. Papakonstantinou P. Tsanakas, S. Tzafestas EECS, National Tech. Univ. of Athens, Athens Greece Abstract In this paper, we propose a unified model for indexing and retrieving multimedia data using characteristic features and multiresolution processing. All multimedia data types, such as audio and video, are represented as a k-dimensional signal in a spatio-temporal domain [3]. A k-dimensional signal is transformed into characteristic features and these features are stored in a hierarchical multidimensional structure, called the k-tree. With this approach, both the retrieval accuracy and response time can be dynamically adapted. We present measurements on accuracy and response times for the various levels of the k-tree structure. 1. Introduction Content-based indexing has become more important since conventional databases cannot provide the necessary efficiency and performance. [2] However, it encounters three major difficulties. First, data content (content, or feature) is subjective information that is used to characterize the data. Second, the huge data size affects the computation, since content-based retrieval requires the similarity matching of the features. Third, the data is multidimensional in the spatio-temporal domain. In this paper, we propose a model for multimedia retrieval based upon multiresolution processing of a unified structure to handle queries. Our retrieval method is based upon finding minimal distance values from the query transform to the transforms stored in the search space; each transform is stored in a k-tree. Existing mathematical discrete functions or histogram comparisons can do the data transformation. The system exploits the k-tree structure to hold spatial and/or temporal information for k-dimensional multimedia data. The k-tree structure gives the benefit of multiresolution processing: choosing an appropriate level of tree optimizes storage usage and processing time. Furthermore, this same model can be applied to different media. In this paper, we present performance measurements of using the k-tree structure for searching large image databases. 2. Multimedia data and features Multimedia data can be viewed as raw data or the features that categorize it. Raw multimedia data consists of data structures with diverse characteristics such as image, audio, video, and motion picture. Image is a two- dimensional signal, where data in both dimensions are spatial data. Audio is a one-dimensional signal. [5] It usually is a fixed-frequency sequence of amplitude measurements in a temporal domain. An audio signal in multimedia system is usually sampled and encoded from an analog signal. Video is a three-dimensional signal; two dimensions represent the spatial data, the third dimension is time. The information of motion picture is stored as a sequence in time of image frames. Motion picture data is a multidimensional composite signal. It is a temporal synchronized signal of two types of data: video and audio. The processing of multimedia data has a familiar trade-off; one must select an importance ranking of data quality, storage, and computation speed. Features of multimedia data are subjective information. They are used to distinguish one selection of multimedia data from others. Features are classified into two types: low-level and high-level features. The low-level features, such as a color histogram, are features that can be extracted from raw multimedia data by a mathematical computation, such as image processing algorithms. On the other hand, the high-level features, such as the characteristics of a human face, can not be readily and efficiently extracted through the use of a mathematical model. The processing of high-level features is beyond the scope of this paper. 3. Retrieval algorithm The search for an item in a multimedia database uses unique features as the key index. Exact key matching database systems are inefficient and inappropriate for multimedia data. The subjective content of the keys makes the exact match approach unsuitable; instead, similarity searching is a more appropriate approach. Just as in an exact match approach, the crux issues are the building of the index table and the retrieval scheme. A multimedia index entry should contain the salient features, which have been extracted from the raw data,