Eigen Channel Method for Text-Independent Russian Speaker Verification Timur Pekhovsky, Ilya Oparin Speech Technology Center St. Petersburg, Russia {tim,ilya}@speechpro.com Abstract The method for compensation of session variability in text-independent speaker verification is presented in this paper. It is based on maximum likelihood estimations for speaker sessions modelling. The method is shown to reduce the verification error by 21% for 4-second and by 36% for 20-second testing segments comparing to the GMM-UBM baseline. The evaluation was performed for conversational speech recorded in GSM channels. 1. Introduction The session variability is one of the major issues in Gaussian Mixture Model (GMM) based speaker identification. This variability is caused by many phenomena, such as transmission channel, differences in speech recording, environment noises and speaker variability. Taking account of transmission channel peculiarities is probably the most important issue. There are a number of methods for compensation of individual channel variation. If we confine to GMM-based methods, the basic methods are Feature Mapping [1] and Speaker Model Synthesis (SMS) [2]. These methods imply that channel effects are discrete. This feature matters both at the stage of training (when a list of observed channels is specified) and testing (that calls for pre- identification of the channel for a given session). This entails that feature mapping and SMS can be implemented to adapt the models for a limited number of very broad data categories, such as GSM or switched telephone data. If one tries to compensate individual channels (we thus imply influence of both phone microphone and the connection itself under this term) the discrete methods are hardly applicable. The major problem is that it is highly unlikely that for a testing signal we will have a counterpart data in the training corpus that has exactly the same individual channel features. Moreover, the process of identification would be extremely time-consuming. The solution can be found in the method of Eigen Channels. This method does not require the identification of a channel. It allows getting any channels for a testing signal, even those not occurring in training data by linear combination of eigen channels. Initially, the idea emerged in face recognition (Eigen Faces) and then was applied to speech recognition to account for variation in speaker’s voices (Eigen Voices). Finally, the approach was applied to speaker identification in order to compensate for the variation of the speaker in different channels (Eigen Channels). The major idea of this approach is based on the fact most of channel variability is in the low- dimesional space of eigen channels. This space can be created by consistent training on the whole training data. During training the space of GMM super-vectors is split into the sub-space of channel-independent speakers and the sub-space of speaker-independent channels. The split itself is explained in details in [5,6,7] for the model of Joint Factor Analysis. In this work, as opposed to the approach originally proposed by P.Kenny [5], we are obtaining the sub-space of channel-independent speakers by a more traditional method that includes Universal Background Models (UBM). Our research is mostly inspired by works of R.Kuhn and P.Ngyuen who use Maximum-Likelihood (ML) estimates in eigen channel method [3,4].