Digital Signal Processing 10, 198–212 (2000) doi:10.1006/dspr.2000.0370, available online at http://www.idealibrary.com on Segmental Approaches for Automatic Speaker Verification Dijana Petrovska-Delacrétaz *,1 , Jan ˇ Cernocký , Jean Hennebert , and Gérard Chollet § * Circuits and Systems Laboratory, Swiss Federal Institute of Technology, DE-CIRC, 1015 Lausanne, Switzerland; Institute of Radio-electronics, Brno University of Technology, Czech Republic; UpperSide Consulting, 52 Chaussée de Vleurgat, 1050 Brussels, Belgium; and § TSI Department, CNRS ENST, Paris, France E-mail: dijana.petrovska@epfl.ch, cernocky@urel.fee.vutbr.cz, jean.hennebert@upperside.com, chollet@tsi.enst.fr Petrovska-Delacrétaz, Dijana, ˇ Cernocký, Jan, Hennebert, Jean, and Chollet, Gérard, Segmental Approaches for Automatic Speaker Verifica- tion, Digital Signal Processing 10 (2000), 198–212. Speech is composed of different sounds (acoustic segments). Speakers differ in their pronunciation of these sounds. The segmental approaches described in this paper are meant to exploit these differences for speaker verification purposes. For such approaches, the speech is divided into different classes, and the speaker modeling is done for each class. The speech segmentation applied is based on automatic language independent speech processing tools that provide a segmentation of the speech requiring neither phonetic nor orthographic transcriptions of the speech data. Two different speaker modeling approaches, based on multilayer perceptrons (MLPs) and on Gaussian mixture models (GMMs), are studied. The MLP- based segmental systems have performance comparable to that of the global MLP-based systems, and in the mismatched train-test conditions slightly better results are obtained with the segmental MLP system. The segmental GMM systems gave poorer results than the equivalent global GMM systems. 2000 Academic Press 1. INTRODUCTION Various studies [10, 15, 21, 22] have shown that phonemes have different discriminant power for the speaker verification task. In [22] it was shown that nasals, fricatives, and vowels have better performance than plosives and 1 Currently at AT&T Laboratories, Speech Research, 180 Park Avenue, Florham Park, NJ 07932. E-mail: dijana@research.att.com. 1051-2004/00 $35.00 Copyright 2000 by Academic Press All rights of reproduction in any form reserved.