Vol.:(0123456789) 1 3 International Journal of Speech Technology https://doi.org/10.1007/s10772-019-09630-9 Thorough evaluation of TIMIT database speaker identifcation performance under noise with and without the G.712 type handset Musab T. S. Al‑Kaltakchi 1 · Raid Raf Omar Al‑Nima 2 · Mohammed A. M. Abdullah 3 · Hikmat N. Abdullah 4 Received: 19 March 2019 / Accepted: 28 August 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019 Abstract In this work, a speaker identifcation system is proposed which employs two feature extraction models, namely: the power normalized cepstral coefcients and the mel frequency cepstral coefcients. Both features are subjected to acoustic modeling using a Gaussian mixture model–universal background model. The purpose of this work is to provide a thorough evaluation of the efect of diferent types of noise on the speaker identifcation accuracy (SIA) and thereby providing benchmark fgures for future comparative studies. In particular, the additive white Gaussian noise and eight non-stationary noise types (with and without the G.712 type handset) corresponding to various signal to noise ratios are tested. Fusion strategies are also employed using late fusion methods: maximum, weighted sum, and mean fusion. The measurements of randomly selected 120 speakers from the TIMIT database are employed and the SIA is used to measure the system performance. The weighted sum fusion resulted in the best performance in terms of SIA with noisy speech. The proposed model given in this work and its related analysis paves the way for further work in this important area. Keywords Speaker identifcation · TIMIT-database · Stationary and non-stationary background noise · G.712 type handset 1 Introduction Several biometrics traits have been proposed employing var- ious traits (Chaki et al. 2019) such as speech biometric (Sun et al. 2019), fngerprint (Rajeswari et al. 2017), fnger tex- ture (Al-Nima et al. 2017), face (Sghaier et al. 2018), signa- ture (Morales et al. 2017), human ear and palmprint (Hezil and Boukrouche 2017), sclera (Alkassar et al. 2015) and iris pattern (Abdullah et al. 2015). An important application in biometrics and forensics is to identify speakers based on their unique voice pattern which is known as speaker recognition (Togneri and Pullella 2011). There are many areas where this technique can be successfully applied for security and investigation perspec- tive including forensics, remote access control, web services and online banking (El-Ouahabi et al. 2019). Traditionally, speaker recognition systems were devel- oped and tested in a clean speech environment. However, in many applications of speaker recognition, the speech sam- ples provided to the system may sufer from diferent types of noise. In order to achieve a robust speaker identifcation, the efect of noise should be investigated as the noise can badly afect the performance of a speaker recognition sys- tem (Ming et al. 2007). According to Verma and Das (2015), feature extraction within speaker identifcation should be less infuenced by noise or the person’s health. In this work, we present a thorough evaluation for the TIMIT database under a wide range of environmental noise conditions, hence, providing benchmark evaluations for other researchers working in the speaker identifcation feld. In summary, our contributions are as follows. • Eight NSN types, as well as the AWGN with and without the G.712 type handset are investigated. • The relation between the SIAs for eight NSN and AWGN with the signal to noise ratios (SNRs) is measured. * Musab T. S. Al-Kaltakchi musab.tahseen@gmail.com 1 Department of Electrical Engineering, College of Engineering, Mustansiriyah University, Baghdad, Iraq 2 Technical Engineering College of Mosul, Northern Technical University, Mosul, Iraq 3 Computer and Information Engineering Department, College of Electronics Engineering, Ninevah University, Mosul, Iraq 4 College of Information Engineering, Al-Nahrain University, Baghdad, Iraq