NOVEL VARIABLE LENGTH TEAGER ENERGY BASED FEATURES FOR PERSON
RECOGNITION FROM THEIR HUM
Hemant A. Patil
1
and Keshab K. Parhi
2
1
Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar,
India-382 007.
2
Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN
55455, USA.
ABSTRACT
Most of the state-of-the-art voice biometrics systems use the
natural speech signal (either read speech or spontaneous or
contextual speech) from the subjects. In this paper, an
attempt is made to identify speakers from their hum. A new
feature set, viz., Variable length Teager Energy Based Mel
Frequency Cepstral Coefficients (VTMFCC) is proposed for
this problem. Experiments have been carried out for person
identification and verification task using Linear Prediction
Cepstral Coefficients (LPCC) and Mel Frequency Cepstral
Coefficients (MFCC) with polynomial classifier of 2
nd
order
approximation. It is shown that the speaker identification
rate for proposed feature set outperforms LPCC by 13.6%
and is competitive over baseline MFCC. For speaker
verification, a reduction in equal error rate (EER) by 1.73%
is achieved when a score-level fusion system is employed by
combining evidence from MFCC and VTMFCC.
Index Terms— Voice biometrics, Humming, VTEO
1. INTRODUCTION
In this paper, we propose a voice biometrics system for
identification of speakers based on their hum using variable
length Teager energy-based acoustic features. A hum is a
sound made by singing a wordless tone with the mouth
completely closed, forcing the sound to emerge from the
nose. To hum is to produce such sound, most often with a
melody. As humming contains no linguistic information,
voice biometrics based on humming is a challenging
research issue. However, a humming-based speaker
recognition system may be applicable to a person with
speech disorder and an infant, who is not able to speak [1],
[2]. In terms of universality, which is an essential criterion to
be considered while designing any biometric systems [3],
humming is more universally available on everyone than
speech [4] and has relevance in forensic conditions [5].
Fig. 1 and Fig. 2 show the hum sampled at 22050 Hz (and
their corresponding pitch contours, vocal tract resonances
and spectrograms) produced for a Hindi song, viz., ‘Ye
Sham Mastani Madhhosh Kiye Jaay (This beautiful evening
charges me),’ by two male speakers of age 21 years. It is
evident from the plots (both time-domain and pitch striations
in spectrograms) that the hum signal is mostly periodic in
nature. In addition to this, pattern of hum signal, pitch
contour, formant contour and spectrogram for each speaker
are distinct. This motivates us to investigate whether we can
(c)
(b)
(a)
(b)
(c)
Fig.1. (a) Hum for a song, viz., ‘Yey Sham Mastani Madhhosh
Kiye Jaay’ by a male speaker A, (b) Spectrogram and formant
contour of hum shown in (a), (c) Corresponding pitch contour.
Fig. 2. Similar hum analysis for male speaker B.
(a)
(a)
4526 978-1-4244-4296-6/10/$25.00 ©2010 IEEE ICASSP 2010