A SPEECH/MUSIC DISCRIMINATOR -BASED AUDIO BROWSER WITH A DEGREE OF CERTAINTY MEASURE Jani Penttilä * , Johannes Peltola * , Tapio Seppänen † * VTT Electronics, Oulu, Finland, E-mail: jani.penttila@vtt.fi † University of Oulu, Oulu, Finland Abstract. In recent years the field of content-based audio signal classification and retrieval has gained a growing amount of interest among researchers around the world. This paper describes a technique, which is used to automatically discriminate audio signals between speech and music. Our goal was to achieve reliable classification results using computationally inexpensive time-domain features. The classification results for lengthy real- world signals are presented as filtered time series that show the degree of certainty of belonging to a particular class. We use a four-dimensional feature space and a standard kNN- classifier. The used features are based on the fluctuation of energy and power over time. The current performance accuracy of the system is 97.9%. 1. Introduction New kinds of terminal devices, high-speed wireless communications, real-time distributed software, and intelligent media servers allow the development of more demanding mobile computing applications in the forthcoming future. The whole communications infrastructure will change into a hybrid network where mobile communicators, Internet, and digital TV services may be accessed through different types of terminals. The primary research objectives are set by the increasing demand for systems and application prototypes that provide the means to efficiently retrieve information from this global network of digital media. As the communications infrastructure changes individual consumers will be able to make on-demand requests for specific multimedia material stored in the digital media servers. The extensive archives may also be browsed through with customized data-mining tools that offer the possibility to download data according to a pre-set preference-configuration or an automatically generated user profile. The need for powerful and competent search robots will grow dramatically with the vastly increasing volume of the archived material. The current search technology is limited to annotating the digitized material by hand. Automating the off- line annotation would provide a more cost-effective method to create a powerful indexing mechanism and to make the on-line search operations more efficient and user-friendly. Research areas in content-based classification and retrieval of audio include speech recognition, speaker identification, sound source recognition, automatic transcription of music, tempo estimation, beat tracking, etc. The first step in building a content-based audio signal classification system is making a rough division between speech and music. An important target is also the simplicity and speed of the algorithm so that a satisfactory level of performance is achieved when processing a large amount of digital material. The problem of distinguishing speech signals from music has become increasingly important especially with the popularization of automatic speech recognition in multimedia domains. When using automatic speech recognition, it is understandably important to be able