International Journal of Electrical and Computer Engineering (IJECE)
Vol. 10, No. 4, August 2020, pp. 3508~3518
ISSN: 2088-8708, DOI: 10.11591/ijece.v10i4.pp3508-3518 3508
Journal homepage: http://ijece.iaescore.com/index.php/IJECE
Speaker specific feature based clustering and its applications in
language independent forensic speaker recognition
Satyanand Singh
1
, Pragya Singh
2
1
School of Electrical and Electronics Engineering, Fiji National University, Fiji Island
2
School of Public Health and Primary Care, Fiji National University, Fiji Island
Article Info ABSTRACT
Article history:
Received Dec 5, 2018
Revised Dec 18, 2019
Accepted Jan 11, 2020
Forensic speaker recognition (FSR) is the process of determining whether
the source of a questioned voice recording (trace) is of a specific individual
(suspected speaker). Most existing methods measure inter-utterance
similarities directly based on spectrum-based characteristics, the resulting
clusters may not be well related to speaker’s, but rather to different acoustic
classes. This research addresses this deficiency by projecting language-
independent utterances into a reference space equipped to cover the standard
voice features underlying the entire utterance set. Then a clustering approach
is proposed based on the peak approximation in order to maximize
the similarities between language-independent utterances within all clusters.
This method uses a K-medoid, Fuzzy C-means, Gustafson and Kessel and
Gath-Geva algorithm to evaluate the cluster to which each utterance should
be allocated, overcoming the disadvantage of traditional hierarchical
clustering that the ultimate outcome can only hit the optimum recognition
efficiency. The recognition efficiency of K-medoid, Fuzzy C-means,
Gustafson and Kessel and Gath-Geva clustering algorithms are 95.2%,
97.3%, 98.5% and 99.7% and EER are 3.62%, 2.91 %, 2.82%, and 2.61%
respectively. The EER improvement of the Gath-Geva technique based
FSRsystem compared with Gustafson and Kessel and Fuzzy C-means is
8.04% and 11.49% respectively.
Keywords:
Alternative dunn index (ADI)
Classification entropy (CE)
Dunn index (DI)
Fuzzy maximum likelihood
(FML)
Partition coefficient (PC)
Partition index (SC)
Separation index(S)
Xie and Beni index (XB)
Copyright © 2020 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Satyanand Singh,
School of Electrical and Electronics Engineering,
Fiji National University, Fiji Island.
Email: satyanand.singh@fnu.ac.fj
1. INTRODUCTION
Speaker recognition is the general term used to include all the many different tasks of discrimination
based on the sound of their voices between one person and another [1]. Forensics means the use of science or
technology in investigating and finding in the court of law facts or evidence. The role of forensic science is to
provide information (in fact or opinion) to assist investigators and law courts in answering questions of
importance. Forensic speaker recognition is the method of determining whether the origin of a questioned
voice recording (trace) is a particular person (suspected speaker). This process involves comparing an
unidentified voice recording (questioned recording) with one or more recordings of a known voice
(the alleged speaker's voice) [1]. Forensic Automatic Speaker Recognition (FASR) is an established term
used in the adaptation of automatic speaker recognition methods to forensic applications. For automated
speaker identification, the deterministic or predictive models of the voice of the speaker's acoustic
characteristics are contrasted with the acoustic characteristics of the recordings for question [1].
The clustering of speaker’s refers to the function of grouping together unidentified speech
expressions based on the voice characteristics of a speaker. The concerns and needs of the speaker
recognition community have been a major motivation for the research on speaker clustering for more than