M. Tistarelli and M.S. Nixon (Eds.): ICB 2009, LNCS 5558, pp. 484–493, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Support Vector Machine Regression for Robust Speaker
Verification in Mismatching and Forensic Conditions
Ismael Mateos-Garcia, Daniel Ramos, Ignacio Lopez-Moreno,
and Joaquin Gonzalez-Rodriguez
ATVS – Biometric Recognition Group,
Escuela Politecnica Superior, Universidad Autonoma de Madrid,
C. Francisco Tomás y Valiente 11, 28049 Madrid, Spain
{ismael.mateos,daniel.ramos,ignacio.lopez,
joaquin.gonzalez}@uam.es
Abstract. In this paper we propose the use of Support Vector Machine Regres-
sion (SVR) for robust speaker verification in two scenarios: i) strong mismatch
in speech conditions and ii) forensic environment. The proposed approach seeks
robustness to situations where a proper background database is reduced or not
present, a situation typical in forensic cases which has been called database
mismatch. For the mismatching condition scenario, we use the NIST SRE 2008
core task as a highly variable environment, but with a mostly representative
background set coming from past NIST evaluations. For the forensic scenario,
we use the Ahumada III database, a public corpus in Spanish coming from real
authored forensic cases collected by Spanish Guardia Civil. We show experi-
ments illustrating the robustness of a SVR scheme using a GLDS kernel under
strong session variability, even when no session variability is applied, and espe-
cially in the forensic scenario, under database mismatch.
Keywords: Speaker verification, forensic, GLDS, SVM classification, SVM
regression, session variability compensation, robustness.
1 Introduction
Speaker verification is currently a mature technology which aims at determine
whether a given speech segment of unknown source belongs to the identity of a
claimed individual or not. Among the most important challenges of a speaker verifica-
tion system is the robustness to the mismatch in conditions between training and test-
ing utterances, being its compensation a main factor for the improvement of system
performance. Recently, this task has been carried out by the use of data-driven session
variability compensation techniques based on factor analysis, which have become the
state of the art in these technologies as can be seen in the periodic NIST Speaker
Recognition Evaluations (SRE) [1]. Such techniques can be applied to the best-
performing systems working at the spectral level, mainly based on Gaussian Mixture
Models (GMM) [2] and Support Vector Machines (SVM) [3], increasing their
robustness and accuracy. Among all the different compensation variants, the Nuisance