Performance of a Text-Independent Remote Speaker Recognition Algorithm over Communication Channels with Blind Equalisation Katrina Neville, Jusak Jusak, Student Member, IEEE, Zahir M. Hussain, Senior Member, IEEE and Margaret Lech School of Electrical and Computer Engineering, RMIT University, Melbourne, Victoria 3000, Australia. E-mails: s2008178@student.rmit.edu.au; s3001898@student.rmit.edu.au; zmhussain@ieee.org; margaret.lech@rmit.edu.au Abstract— In this work we will present a study of the reliability of a well-known speaker recognition algorithm when using speech sent over communication channels with Channel distortion and noise. The speech features used to test and train this system are the Mel-Frequency Cepstral Coefﬁcients. For speaker recognition applications, channel deformations can lead to serious errors in recognition if the speech is transmitted, making the algorithm unreliable for usage in telephone banking or other applications requiring a high level of security. We will study the performance and reliability of this algorithm for text-independent speaker recognition with speech sent over a communication channel. We will be using blind equalisation techniques with QPSK modulation. I. I NTRODUCTION Speaker recognition has attracted a lot of attention both in industry and academia in the last two decades particularly with the increased need for security of sensitive information. Substantial research has been done on the accuracy of Au- tomatic systems in recognising speakers from certain unique features within the individual’s voice and it has been found in many cases that automatic systems can be more accurate in recognising someone from their voices than even humans can [1]. Even so problems can and do arise when speaker’s need to be identiﬁed from there voices after being sent over a channel and compensation needs to come into effect so that the features extracted from a sample of speech are relevant and accurate. Many feature extraction techniques are available for use in speaker recognition, they are classiﬁed into two groups, spec- tral based and non-spectral based features [3]. In this paper we have selected the spectral based Mel-cepstrum algorithm to be used in our system and we will study this system’s performance when a decision needs to be made with voices sent through wired and wireless communication channels. The performance of this system will be determined by estimating the error, or distance between the reference cepstral coefﬁ- cients and the received (equalised) coefﬁcients. Equalisation techniques have been extensively used in commu- nication systems to remove intersymbol interference produced by dispersive channels [9], and has become increasingly important where full bandwidth utilisation of the channel is necessary. Conventional equalisation techniques rely on the transmission of training signals which leads to a reduction in channel bandwidth and allocated resources. Thus, in the last few years, blind equalisation techniques have gained an increasing interest. The most popular and implemented blind adaptation algorithm is the constant modulus algorithm (CMA) proposed in [6] and developed independently by [7]. The main advantage of using a ’blind’ system is apparent where the use of training signals is both unrealistic and costly to implement. The Constant Modulus Algorithm (CMA) has attracted the main research effort as a suitable blind wireless channel equaliser; due to its robustness over the violation of perfect blind equalisation (PBE) conditions [8]. II. SYSTEM MODEL Our system was set up according to the following block diagram in ﬁg 1. Training Speech Testing Speech Preprocessing Feature Extraction Channel Channel Equalisation Feature Extraction Preprocessing Decision + - Fig. 1. Speaker Recognition System Block Diagram. Firstly the system is trained using 10 different speakers from the German emotional database Emo-DB [5] (For these experiments only neutral speech samples are used from this database, therefore change in emotions can’t interfere with the results obtained). To train the system, ﬁrstly the clean speech is preprocessed by pre-emphasising the speech using a ﬁrst order high pass ﬁlter, the silence segments are then removed and twenty Mel-Frequency Cepstral Coefﬁcients are extracted and saved. For the testing of the system clean speech from one of the speakers is converted into binary before being passed through a channel with impulse response c = [0.04, -0.05, 0.07, -0.21, -0.5, 0.72, 0.36, 0, 0.21, 0.03, 0.07] and equalised using a Blind Equalisation algorithm, i.e., CMA. The binary data is then converted back to a speech ﬁle and processed as in the training phase and the Cepstral Authorized licensed use limited to: RMIT University. Downloaded on November 18, 2008 at 21:11 from IEEE Xplore. Restrictions apply.