VOICING-STATE CLASSIFICATION OF CO-CHANNEL SPEECH USING NONLINEAR STATE-SPACE RECONSTRUCTION Y. A. Mahgoub and R. M. Dansereau Carleton University, Department of Systems & Computer Engineering 1125 Colonel By Drive, Ottawa, ON, K1S 5B6, Canada ABSTRACT This paper presents a new classification method to determine the voicing-state of co-channel speech based on nonlinear state-space reconstruction. Nonlinear approaches are known to give a better access to the full dynamics of speech system than linear techniques. Three voicing-states of co-channel speech are considered; Unvoiced/Unvoiced (U/U), Voiced/Unvoiced (V/U), and Voiced/Voiced (V/V). The proposed method requires neither a priori information nor speech training data. Nonetheless, simulation results show enhanced performance in identifying the three voicing-states using the proposed method compared to other existing techniques. 1. INTRODUCTION Co-channel speech is defined as the composite speech signal of two or more talkers [1]. This phenomenon commonly occurs due to the combination of speech signals from simultaneous and independent sources into one signal at the receiving microphone, or when two speech signals are transmitted simultaneously over a single channel. The use of a voicing-state classifier is essential in many applications where co-channel speech might occur. This includes Automatic Speech Recognition (ASR), Speaker Identification (SID), and speech enhancement techniques. Most of the conventional techniques used to classify voicing-state of single and co-channel speech rely on the pattern recognition approach and treat the speech system as a linear system [2], [3], [7]. However, the application of nonlinear dynamical methods to speech characterization and analysis has produced numerous new and promising approaches over the last two decades. For example, the results in [4] and [5] have shown enhanced performance in solving the general problems of pitch determination and speech enhancement by using nonlinear methods compared to the linear techniques. Nonlinear approaches are known to give a better access to the full dynamics of speech system than linear techniques. Previous work on voicing-state classification of co- channel speech has shown some success using either a priori information about the individual speakers [6] or training data sets [7]. However, a priori information is not always available in many practical situations. Also, methods that use training data sets are speaker- and environment-dependent. Every time the recording conditions or the background noise level change, a new set of training data is required. In [8], it has been attempted to locate only the “usable speech” segments (single-talker voiced frames) using the Spectral Autocorrelation Peak Valley Ratio (SAPVR) technique. No great attention has been given to the other voicing- state classes of co-channel speech. In this paper, a new voicing-state classification algorithm for co-channel speech, based on nonlinear state- space reconstruction, is proposed. Three voicing-states are considered in this study: 1. Unvoiced/Unvoiced (U/U): where both speakers are either in the unvoiced state or the silence state. 2. Voiced/Unvoiced (V/U): where only one speaker is in the voiced state. 3. Voiced/Voiced (V/V): where both speakers are in the voiced state. The silence state is assumed to be a subset of the unvoiced class. Also, no need to differentiate between speakers in the V/U class is assumed. In Sec. 2, the principle of state- space reconstruction is explained. The new proposed method is described in Sec. 3. Comparisons of the proposed algorithm to the other existing techniques with the aid of computer simulations are presented in Sec. 4. Finally, Sec. 5 includes the conclusions reached. 2. STATE-SPACE RECONSTRUCTION State-space (also called phase-space) reconstruction is the first step in nonlinear time series analysis. It basically views a single-dimensional data series, ( ); s n 1, 2, , n N = , in an m-dimensional Euclidean space, m . Using this method, the trajectories that connect data points (vectors) in the state-space are expected to form an attractor that preserves the topological properties of the original unknown attractor. A common way to reconstruct the state-space is the method of delays introduced by I - 409 0-7803-8874-7/05/$20.00 ©2005 IEEE ICASSP 2005 ➠ ➡