39 Efficient GMM-UBM System in Text Independent Speaker Verification Using Structural Gaussian Mixture Models R. Saeidi *, H. R. Sadegh Mohammadi **, M. Khalaj Amirhosseini * * Electrical Engineering Department, Iran University of Science and Technology, Narmak, Tehran, Iran. ** Iranian Research Institute for Electrical Engineering, Narmak, Tehran, Iran Emails: r.saeidy@matn.com , h.sadegh@ijece.org , khalaja@iust.ac.ir Abstract In this paper a new method for reducing the computational load of Gaussian Mixture Model Universal Background Model (GMM-UBM) based speaker verification is simulated and discussed. System uses structural Gaussian mixture models (SGMMs) and a neural network to achieve both computational efficiency and high accuracy in text- independent speaker verification. Effects of two factors namely interpolation coefficient and maximum capacitance factor are studied on Structural Background Model (SBM) construction. The scores obtained in different layers of a tree-structured model are combined via a neural network to make final decision. Different configurations are compared in the experiments conducted on a TV recorded speech data set. Experimental results show that a SBM-SGMM system used in a conjunction with and Neural Nets can reduce the computational complexity by a factor of about 2.7 with minimum Decision Cost Function (DCF) reduction about 19% compared to the baseline GMM-UBM system. Keywords: UBM, structural Gaussian mixture model, neural network, speaker verification. 1. Introduction Speaker recognition, including speaker identification and speaker verification, has been an active research area for several decades. A popular method is to model the speakers with the Gaussian Mixture Model (GMM) based on the maximum-likelihood (ML) criterion in speaker verification, which has been shown to outperform several other existing techniques [1]. The Gaussian Mixture Model Universal Background Model (GMM-UBM) method for speaker verification has also demonstrated high performance in several NIST evaluations and has become the dominant approach in text-independent speaker verification [2]. In many applications, accuracy and computational complexity are two important factors. In GMM-UBM speaker verification systems, the major computation loads are the likelihood calculation for all mixtures of the UBM to select the highest scoring mixtures (top-C mixtures) and the likelihood calculation for all speaker models in the system. As reported in the literature, such a system with no optimization tends to use more than 90% of the processing time for scoring Gaussian densities. Some straightforward techniques have been investigated to speed up computation in a GMM- UBM speaker verification system while achieving an acceptable tradeoff between accuracy and complexity [3]. Inspired by the method presented in [4], in this paper a structural adaptation scheme is studied which assumes a hierarchical structure of model common to all speakers. A multi-resolution GMM is used whose mean vectors are organized in a tree structure, with coarse-to-fine resolution when going down the tree. Bayesian adaptation is then performed in a hierarchical way, propagating the estimated values of the coarsest GMM means down the tree via linear regression between contiguous depths. This allows some of the means of the finest resolution speaker GMM which are not observed in the training set to be adapted according to their parent (or ancestor) node. As in the classical Bayesian adaptation approach, the parameters of the multi-resolution prior background GMMs are estimated using prior data. We used minimax method and the procedure described in [4] for the construction of the tree structure in our simulations. The remainder of the paper is organized as follows. In Section 2, a brief description of GMM-UBM speaker verification is provided. In Sections 3, a tree construction method is explained. It also describes multi-level adaptation, mixture selection, and the verification procedure. The experimental results are presented in Section 4. Finally, This paper is concluded in Section 5.