NOVEL VARIABLE LENGTH TEAGER ENERGY BASED FEATURES FOR PERSON RECOGNITION FROM THEIR HUM Hemant A. Patil 1 and Keshab K. Parhi 2 1 Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, India-382 007. 2 Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA. ABSTRACT Most of the state-of-the-art voice biometrics systems use the natural speech signal (either read speech or spontaneous or contextual speech) from the subjects. In this paper, an attempt is made to identify speakers from their hum. A new feature set, viz., Variable length Teager Energy Based Mel Frequency Cepstral Coefficients (VTMFCC) is proposed for this problem. Experiments have been carried out for person identification and verification task using Linear Prediction Cepstral Coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC) with polynomial classiﬁer of 2 nd order approximation. It is shown that the speaker identification rate for proposed feature set outperforms LPCC by 13.6% and is competitive over baseline MFCC. For speaker verification, a reduction in equal error rate (EER) by 1.73% is achieved when a score-level fusion system is employed by combining evidence from MFCC and VTMFCC. Index Terms— Voice biometrics, Humming, VTEO 1. INTRODUCTION In this paper, we propose a voice biometrics system for identiﬁcation of speakers based on their hum using variable length Teager energy-based acoustic features. A hum is a sound made by singing a wordless tone with the mouth completely closed, forcing the sound to emerge from the nose. To hum is to produce such sound, most often with a melody. As humming contains no linguistic information, voice biometrics based on humming is a challenging research issue. However, a humming-based speaker recognition system may be applicable to a person with speech disorder and an infant, who is not able to speak [1], [2]. In terms of universality, which is an essential criterion to be considered while designing any biometric systems [3], humming is more universally available on everyone than speech [4] and has relevance in forensic conditions [5]. Fig. 1 and Fig. 2 show the hum sampled at 22050 Hz (and their corresponding pitch contours, vocal tract resonances and spectrograms) produced for a Hindi song, viz., ‘Ye Sham Mastani Madhhosh Kiye Jaay (This beautiful evening charges me),’ by two male speakers of age 21 years. It is evident from the plots (both time-domain and pitch striations in spectrograms) that the hum signal is mostly periodic in nature. In addition to this, pattern of hum signal, pitch contour, formant contour and spectrogram for each speaker are distinct. This motivates us to investigate whether we can (c) (b) (a) (b) (c) Fig.1. (a) Hum for a song, viz., ‘Yey Sham Mastani Madhhosh Kiye Jaay’ by a male speaker A, (b) Spectrogram and formant contour of hum shown in (a), (c) Corresponding pitch contour. Fig. 2. Similar hum analysis for male speaker B. (a) (a) 4526 978-1-4244-4296-6/10/$25.00 ©2010 IEEE ICASSP 2010