International Journal of Electrical and Computer Engineering (IJECE) Vol. 10, No. 4, August 2020, pp. 3508~3518 ISSN: 2088-8708, DOI: 10.11591/ijece.v10i4.pp3508-3518  3508 Journal homepage: http://ijece.iaescore.com/index.php/IJECE Speaker specific feature based clustering and its applications in language independent forensic speaker recognition Satyanand Singh 1 , Pragya Singh 2 1 School of Electrical and Electronics Engineering, Fiji National University, Fiji Island 2 School of Public Health and Primary Care, Fiji National University, Fiji Island Article Info ABSTRACT Article history: Received Dec 5, 2018 Revised Dec 18, 2019 Accepted Jan 11, 2020 Forensic speaker recognition (FSR) is the process of determining whether the source of a questioned voice recording (trace) is of a specific individual (suspected speaker). Most existing methods measure inter-utterance similarities directly based on spectrum-based characteristics, the resulting clusters may not be well related to speaker’s, but rather to different acoustic classes. This research addresses this deficiency by projecting language- independent utterances into a reference space equipped to cover the standard voice features underlying the entire utterance set. Then a clustering approach is proposed based on the peak approximation in order to maximize the similarities between language-independent utterances within all clusters. This method uses a K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva algorithm to evaluate the cluster to which each utterance should be allocated, overcoming the disadvantage of traditional hierarchical clustering that the ultimate outcome can only hit the optimum recognition efficiency. The recognition efficiency of K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva clustering algorithms are 95.2%, 97.3%, 98.5% and 99.7% and EER are 3.62%, 2.91 %, 2.82%, and 2.61% respectively. The EER improvement of the Gath-Geva technique based FSRsystem compared with Gustafson and Kessel and Fuzzy C-means is 8.04% and 11.49% respectively. Keywords: Alternative dunn index (ADI) Classification entropy (CE) Dunn index (DI) Fuzzy maximum likelihood (FML) Partition coefficient (PC) Partition index (SC) Separation index(S) Xie and Beni index (XB) Copyright © 2020 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Satyanand Singh, School of Electrical and Electronics Engineering, Fiji National University, Fiji Island. Email: satyanand.singh@fnu.ac.fj 1. INTRODUCTION Speaker recognition is the general term used to include all the many different tasks of discrimination based on the sound of their voices between one person and another [1]. Forensics means the use of science or technology in investigating and finding in the court of law facts or evidence. The role of forensic science is to provide information (in fact or opinion) to assist investigators and law courts in answering questions of importance. Forensic speaker recognition is the method of determining whether the origin of a questioned voice recording (trace) is a particular person (suspected speaker). This process involves comparing an unidentified voice recording (questioned recording) with one or more recordings of a known voice (the alleged speaker's voice) [1]. Forensic Automatic Speaker Recognition (FASR) is an established term used in the adaptation of automatic speaker recognition methods to forensic applications. For automated speaker identification, the deterministic or predictive models of the voice of the speaker's acoustic characteristics are contrasted with the acoustic characteristics of the recordings for question [1]. The clustering of speaker’s refers to the function of grouping together unidentified speech expressions based on the voice characteristics of a speaker. The concerns and needs of the speaker recognition community have been a major motivation for the research on speaker clustering for more than