Vol.:(0123456789) 1 3
Int J Speech Technol (2017) 20:417–429
DOI 10.1007/s10772-017-9419-z
A novel scores fusion approach applied on speaker verifcation
under noisy environments
Nassim Asbai
1
· Abderrahmane Amrouche
1
Received: 24 April 2016 / Accepted: 19 April 2017 / Published online: 5 May 2017
© Springer Science+Business Media New York 2017
1 Introduction
Over the last two decades, Automatic Speaker Verifcation
(ASV) has been the subject of considerable research due to
its various applications in such areas as telephone banking,
remote access control and surveillance. One of the main
challenges associated to the development of speaker verif-
cation system in real life, is that of undesired variations in
speech characteristics caused by environmental noise (Bin
et al. 2007). Such variations can in turn lead to a mismatch
between the corresponding test and reference model for the
same speaker, and then speaker verifcation accuracy deg-
radation (Ming et al. 2007).
The ASV task can be achieved in three steps: parame-
terization, classifcation and decision (Reynolds 2002). In
parameterization step, the borrowed features from the tech-
nology of speech recognition are directly used (Selouani
and Caelen 1999), based on the search of parameters that
minimize the intra-speaker variability and maximize the
inter-speaker variations and ofer noise robustness. Con-
sequently, this provides relevant coefcient vectors which
reduce the information in terms of quantity and redundancy
such as; Mel-Frequency Cepstral Coefcients (MFCCs)
(Harris 1978), Perceptual Linear Predictive (PLP) (Her-
mansky 1990), Linear Predictive Coding (LPC) (Bundy
and Wallen 1984; Itakura 2005) and Linear Prediction
Cepstrum Coefcients (LPCC) (Atal 2005). However in
the classifcation step, the vectors of the test signal (or the
model) are compared to the vectors of reference speakers
(or their models). In the decision step, a speaker is accepted
or rejected, by comparing his score to a threshold calcu-
lated in classifcation step. Even though many new speaker
verifcation techniques have been proposed to model the
speakers: Vector Quantization (Gray 1984; Ilyas et al.
2007), Hidden Markov Model (Ilyas et al. 2007; Blunsom
Abstract To improve the speaker verifcation system in
adverse conditions, a novel score fusion approach using
adaptive method, based on a prior Equal Error Rate (EER),
is presented in this paper. Currently, the most commonly
used methods are the mean, product, minimum, maximum,
or the weighted sum of scores. Our method introduces the
MLP network which approximates the estimated scores
under noisy conditions, to those of the ideal estimated
in clean environments and gives the optimally weighted
parameters, to be added in the adaptive weights used for
weighting sum of scores. This method is assessed by using
the NIST 2000 corpus and diferent feature extraction
methods. Noisy conditions are created using NOISEX-92.
In severely degraded conditions, the results show that
the speaker verifcation process using our proposed score
fusion approach applied to the GMM-UBM and GMM-
SVM based systems, achieves better performances in terms
of EER reduction than each system used alone.
Keywords Speaker verifcation · GMM-UBM · GMM-
SVM · Score fusion · MLP network · Noisy environment
* Nassim Asbai
nasbai@usthb.dz
Abderrahmane Amrouche
namrouche@usthb.dz
1
Speech Communication and Signal Processing Laboratory,
Faculty of Electronics and Computer Sciences, USTHB,
Bab Ezzouar 16 111, Algeria