Towards Efficient Automated Singer Identification in Large Music Databases ∗ Jialie Shen 1 Bin Cui 2 John Shepherd 1 Kian-Lee Tan 3 1 School of Computer Sci. & Eng. 2 Dept. of Computer Science & 3 School of Computing The University of New South Wales National Lab on Machine Perception National University of Singapore Sydney, Australia Peking University, China 3 Science Drive 2, Singapore {jls, jas}@cse.unsw.edu.au bin.cui@pku.edu.cn tankl@comp.nus.edu.sg ABSTRACT Automated singer identification is important in organising, brows- ing and retrieving data in large music databases. In this paper, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for automated singer recognition. HSI can effectively use multiple low-level features extracted from both vocal and non-vocal music segments to enhance the identification process with a hybrid archi- tecture and build profiles of individual singer characteristics based on statistical mixture models. Extensive experimental results con- ducted on a large music database demonstrate the superiority of our method over state-of-the-art approaches. Categories and Subject Descriptors H.3 [Information Systems]: Content Analysis and Indexing, In- formation Storage and Retrieval; J.5 [Arts and Humanities]: Mu- sic General Terms Algorithm, Experimentation, Measurement, Evaluation Keywords Music Retrieval, Singer Identification 1. INTRODUCTION With the explosive growth of online music repositories, tech- niques for content-based music retrieval are getting more attention in multimedia databases. Consequently, it has become an attrac- tive research topic, and many techniques have been developed to support automatic classification or recognition of music based on instrument, genre and other characteristics [11, 12, 14, 23]. With such techniques, users are provided with powerful functions for browsing and searching musical content in a large music database. ∗ This research was partly supported by The Australian Research Council(ARC Discovery Grant DP0346004). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR ’06, August 6–11, Seattle, Washington, USA Copyright 2006 ACM 1-59593-369-7/06/008 ...$5.00. Techniques for automatic artist identification are becoming more and more important due to numerous potential applications includ- ing music indexing and retrieval, copyright management and music recommendation systems. The development of singer identifica- tion enables the effective management of music databases based on “singer similarity”. With this technology, songs performed by a particular singer can be automatically clustered for easy manage- ment or exploration. Currently, the most popular and naive approach to support singer identification is to manually embed artist information into the mu- sic database with the assistance of music professionals. The main weakness of the approach is that it requires a large amount of time and high domain expertise, which is very expensive. Moreover, in some cases, such as music downloaded from the Internet, descrip- tive information is often lacking and/or inconsistent. It is clear that songs performed by the same singer share certain audio charac- teristics; the singer’s voice is most likely to contain similar audio patterns over all songs (s)he perform. Singers also tend to per- form within a single genre, and so the audio characteristics of their work may contain common features (e.g. instrumentation). This suggests the feasibility of automatic singer identification based on audio content. By “automatic singer identification” here we refer to the task of determining which singer is most likely singing a given song. The effectiveness of a solution heavily relies on its capability to capture salient information for separating one signal from another. While traditional speech recognition techniques [2, 15] could be applied for this task, they are likely to perform well, because the vocal track is intertwined with the nonstationary back- ground signal played by different instruments. However, it is rare that we might acquire a pure solo voice track without instrumenta- tion (unless we had access to the original multi-track data for the song). Several approaches have been recently proposed to apply statis- tical models or machine learning techniques for automatic singer classification/identification [12, 19, 23]. In general, these methods consist of two main steps: singer characteristic modelling based on solo voice and class label identification. In the singer char- acteristic modelling step, acoustic signal information is extracted to represent the music. Then specific mechanisms (i.e., statistical models or machine learning algorithms) are constructed to assign the songs to one of the pre-defined singer categories based on their extracted acoustic features. Unfortunately, these approaches are unable to achieve acceptable classification accuracy. The main rea- sons are: 1) Like other kinds of multimedia objects such as images and videos, songs with vocal components contain many audio fea- tures. Thus, good performance for identification system cannot be expected by employing a single type of feature to represent mu- 59