Similarity Search in Multimedia Time Series Data using Amplitude-Level Features Johannes Aßfalg, Hans-Peter Kriegel, Peer Kr¨ oger, Peter Kunath, Alexey Pryakhin, Matthias Renz Institute for Informatics, Ludwig-Maximilians-Universit¨ at M ¨ unchen, Germany {assfalg,kriegel,kroegerp,kunath,pryakhin,renz}@dbs.ifi.lmu.de Abstract. Effective similarity search in multi-media time series such as video or audio sequences is important for content-based multi-media retrieval applica- tions. We propose a framework that extracts a sequence of local features from large multi-media time series that reflect the characteristics of the complex struc- tured time series more accurately than global features. In addition, we propose a set of suitable local features that can be derived by our framework. These features are scanned from a time series amplitude-levelwise and are called amplitude-level features. Our experimental evaluation shows that our method models the intuitive similarity of multi-media time series better than existing techniques. 1 Introduction Time series data is a prevalent data type in multi-media applications such as video or audio content analysis. Videos are usually modeled as sequences of features extracted for each picture of the video stream. Analogously, audio content is also modeled as a series of features extracted continuously from the audio stream. Similarity search in such time series data is very important for multi-media applications such as query-by- humming, plagiarism detection, or content-based audio and video retrieval. The challenge for similarity search in time series data is twofold. First, the adequate modeling of the intuitive similarity notion between time series is important for the ac- curacy of the search. For that purpose, several distance measures for time series have been defined recently, each of which works fine under specific assumptions and in dif- ferent scenarios (e.g. the Euclidian distance or Dynamic Time Warping (DTW)). Most of them apply features comprising quantitative information of the time series. How- ever, in particular for complex structured time series, features comprising quantitative information are often too susceptible to noise, outliers, and other interfering variables. Second, since time series are usually very large, containing several thousands of values per sequence, the comparison of two time series can be very expensive, particularly when using distance measures that require the access to the raw time series data (i.e. the entire sequence of time series values). For example, for a audio sequence we can derive 300 features per second. Thus, a 3 minute audio sequence is represented by a time series of length 54,000. Generally, shape-based similarity measures like the DTW are very expensive, and are usually not applicable for multi-media data. In this paper, we propose a novel framework for shape-based similarity search on multi-media time series that addresses both mentioned problems. Our approach allows In Proc. 14th International MultiMedia Modeling Conference (MMM'08), Kyoto, Japan, 2008.