Vol.:(0123456789) 1 3 Int J Speech Technol (2017) 20:417–429 DOI 10.1007/s10772-017-9419-z A novel scores fusion approach applied on speaker verifcation under noisy environments Nassim Asbai 1 · Abderrahmane Amrouche 1 Received: 24 April 2016 / Accepted: 19 April 2017 / Published online: 5 May 2017 © Springer Science+Business Media New York 2017 1 Introduction Over the last two decades, Automatic Speaker Verifcation (ASV) has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated to the development of speaker verif- cation system in real life, is that of undesired variations in speech characteristics caused by environmental noise (Bin et al. 2007). Such variations can in turn lead to a mismatch between the corresponding test and reference model for the same speaker, and then speaker verifcation accuracy deg- radation (Ming et al. 2007). The ASV task can be achieved in three steps: parame- terization, classifcation and decision (Reynolds 2002). In parameterization step, the borrowed features from the tech- nology of speech recognition are directly used (Selouani and Caelen 1999), based on the search of parameters that minimize the intra-speaker variability and maximize the inter-speaker variations and ofer noise robustness. Con- sequently, this provides relevant coefcient vectors which reduce the information in terms of quantity and redundancy such as; Mel-Frequency Cepstral Coefcients (MFCCs) (Harris 1978), Perceptual Linear Predictive (PLP) (Her- mansky 1990), Linear Predictive Coding (LPC) (Bundy and Wallen 1984; Itakura 2005) and Linear Prediction Cepstrum Coefcients (LPCC) (Atal 2005). However in the classifcation step, the vectors of the test signal (or the model) are compared to the vectors of reference speakers (or their models). In the decision step, a speaker is accepted or rejected, by comparing his score to a threshold calcu- lated in classifcation step. Even though many new speaker verifcation techniques have been proposed to model the speakers: Vector Quantization (Gray 1984; Ilyas et al. 2007), Hidden Markov Model (Ilyas et al. 2007; Blunsom Abstract To improve the speaker verifcation system in adverse conditions, a novel score fusion approach using adaptive method, based on a prior Equal Error Rate (EER), is presented in this paper. Currently, the most commonly used methods are the mean, product, minimum, maximum, or the weighted sum of scores. Our method introduces the MLP network which approximates the estimated scores under noisy conditions, to those of the ideal estimated in clean environments and gives the optimally weighted parameters, to be added in the adaptive weights used for weighting sum of scores. This method is assessed by using the NIST 2000 corpus and diferent feature extraction methods. Noisy conditions are created using NOISEX-92. In severely degraded conditions, the results show that the speaker verifcation process using our proposed score fusion approach applied to the GMM-UBM and GMM- SVM based systems, achieves better performances in terms of EER reduction than each system used alone. Keywords Speaker verifcation · GMM-UBM · GMM- SVM · Score fusion · MLP network · Noisy environment * Nassim Asbai nasbai@usthb.dz Abderrahmane Amrouche namrouche@usthb.dz 1 Speech Communication and Signal Processing Laboratory, Faculty of Electronics and Computer Sciences, USTHB, Bab Ezzouar 16 111, Algeria