© 2015, IJCSE All Rights Reserved 101
Review Paper Volume-3, Issue-8 E-ISSN: 2347-2693
Speaker Recognition System Techniques and Applications
Sukhandeep Kaur
1*
and Kanwalvir Singh Dhindsa
2
1*,2
Dept.of, CSE, BBSBEC FatehgarhSahib, Punjab Technical University, INDIA
Received: Jul /09/2015 Revised: Jul/22/2015 Accepted: Aug/20/2015 Published: Aug/30/ 2015
Abstract- Speaker verification is feasible method of controlling access to computer and communication network. It is an
automatic process that uses human voice characteristics obtained from a recorded speech signal, as the biometric measurements
to verify claimed identity of speaker. It can be classified into two categories, text–dependent and text-independent system. This
paper introduces the fundamental concepts of speaker verification for security system. It focuses on techniques and their unique
features.
Keywords- Speaker Identification, Gamma Tone Frequency Cepstral Coefficient, Mel Frequency Cepstral Coefficient
1. INTRODUCTION
Speech is the primary way of communication between
humans. Speaker recognition is the process of
automatically recognizing an individual on the basis of
characteristics of words spoken. Speaker recognition has
always target on security system for managing the access
to protected information from being used by anyone.
Speaker verification is the branch of biometric
authentication. This paper runs over comparison of voice
recognition techniques. The parameter should be easily
extracted, not be easily imitated, not to change with space
and time as far as signal contains LCP, LPCC, MFCC,
GFCCetc[4]. The current commonly used methods for
speaker recognition are GMM (Gaussian
Mixture Model) , HMM (Hidden Markov Model), ANN
(Artificial Neural Network)
etc. GMM extends of Gaussian probability density
function working well in speaker recognition systems
because of its capability to approximate the probability
density distribution of arbitrary shape perfectly. HMM
performs well in speaker recognition has a high accuracy.
The three different methods based on HMM are DHMM,
CHMM, and SCHMM [1]. ANN is a computational model
based on the structure and functions of biological neural
networks.ANN have three layers that are interconnected.
The first layer consists of input neurons. Those neurons
send data on to the second layer, which in turn sends the
output neurons to the third layer.
2. RELATED WORK
Mukherjee et al. [2]discussed voice is one of the most
assure and develop biometric modalities for access control.
This paper presents a new method to recognize speakers by
involve a new set of characters and using Gaussian mixture
models (GMMs). In this research, the method of shifted
MFCC was introduced so as to incorporate accent
information in the recognition algorithm. The algorithm is
evaluated using TIDIGIT dataset and the results showed
improvements.
Wang and Ching[7]focussed on the features estimation
method leads to robust recognition performance, specially
at low signal-to-noise ratios. In the context of Gaussian
mixture model-based speaker recognition with the
presence of additive white Gaussian noise, the new
approach produces logicalreduction of both recognition
error rate and equal error rateat signal-to-noise ratios
ranging from 0 to 15 db.
Faraj and Bigun [8]presented the first extended study
investigationthe added value of lip motion features for
speaker and speech-recognition applications. Digit
identification and person-recognition andconfirmation
experiments were conducted on the publicly available
XM2VTS database showing goodresults (speaker
verification was 98 percent, speaker recognition was 100
percent, and digit identification was 83 percent to 100
percent).
Sinith et al. [9]detailed the lay accent on text-Independent
speaker recognition system where we adopted Mel-
Frequency Cepstral Coefficients (MFCC) as the speaker
speech feature argument in the system and the concept of
Gaussian Mixture Modeling (GMM) for modeling the
extracted speech feature. The Maximum likelihood ratio
detector algorithm is used for the decision making process.
The experimental study has been performed for various
speeches time duration and several languages and was
conducted around MATLAB 7 language environment.
Gaussian mixture speaker model achieve high recognition
rate for various speech durations.