VOICING-STATE CLASSIFICATION OF CO-CHANNEL SPEECH USING
NONLINEAR STATE-SPACE RECONSTRUCTION
Y. A. Mahgoub and R. M. Dansereau
Carleton University, Department of Systems & Computer Engineering
1125 Colonel By Drive, Ottawa, ON, K1S 5B6, Canada
ABSTRACT
This paper presents a new classification method to
determine the voicing-state of co-channel speech based on
nonlinear state-space reconstruction. Nonlinear
approaches are known to give a better access to the full
dynamics of speech system than linear techniques. Three
voicing-states of co-channel speech are considered;
Unvoiced/Unvoiced (U/U), Voiced/Unvoiced (V/U), and
Voiced/Voiced (V/V). The proposed method requires
neither a priori information nor speech training data.
Nonetheless, simulation results show enhanced
performance in identifying the three voicing-states using
the proposed method compared to other existing
techniques.
1. INTRODUCTION
Co-channel speech is defined as the composite speech
signal of two or more talkers [1]. This phenomenon
commonly occurs due to the combination of speech
signals from simultaneous and independent sources into
one signal at the receiving microphone, or when two
speech signals are transmitted simultaneously over a
single channel. The use of a voicing-state classifier is
essential in many applications where co-channel speech
might occur. This includes Automatic Speech Recognition
(ASR), Speaker Identification (SID), and speech
enhancement techniques.
Most of the conventional techniques used to classify
voicing-state of single and co-channel speech rely on the
pattern recognition approach and treat the speech system
as a linear system [2], [3], [7]. However, the application of
nonlinear dynamical methods to speech characterization
and analysis has produced numerous new and promising
approaches over the last two decades. For example, the
results in [4] and [5] have shown enhanced performance in
solving the general problems of pitch determination and
speech enhancement by using nonlinear methods
compared to the linear techniques. Nonlinear approaches
are known to give a better access to the full dynamics of
speech system than linear techniques.
Previous work on voicing-state classification of co-
channel speech has shown some success using either a
priori information about the individual speakers [6] or
training data sets [7]. However, a priori information is not
always available in many practical situations. Also,
methods that use training data sets are speaker- and
environment-dependent. Every time the recording
conditions or the background noise level change, a new
set of training data is required. In [8], it has been
attempted to locate only the “usable speech” segments
(single-talker voiced frames) using the Spectral
Autocorrelation Peak Valley Ratio (SAPVR) technique.
No great attention has been given to the other voicing-
state classes of co-channel speech.
In this paper, a new voicing-state classification
algorithm for co-channel speech, based on nonlinear state-
space reconstruction, is proposed. Three voicing-states are
considered in this study:
1. Unvoiced/Unvoiced (U/U): where both speakers
are either in the unvoiced state or the silence state.
2. Voiced/Unvoiced (V/U): where only one speaker is
in the voiced state.
3. Voiced/Voiced (V/V): where both speakers are in
the voiced state.
The silence state is assumed to be a subset of the unvoiced
class. Also, no need to differentiate between speakers in
the V/U class is assumed. In Sec. 2, the principle of state-
space reconstruction is explained. The new proposed
method is described in Sec. 3. Comparisons of the
proposed algorithm to the other existing techniques with
the aid of computer simulations are presented in Sec. 4.
Finally, Sec. 5 includes the conclusions reached.
2. STATE-SPACE RECONSTRUCTION
State-space (also called phase-space) reconstruction is the
first step in nonlinear time series analysis. It basically
views a single-dimensional data series, ( ); s n
1, 2, , n N = , in an m-dimensional Euclidean space,
m
.
Using this method, the trajectories that connect data points
(vectors) in the state-space are expected to form an
attractor that preserves the topological properties of the
original unknown attractor. A common way to reconstruct
the state-space is the method of delays introduced by
I - 409 0-7803-8874-7/05/$20.00 ©2005 IEEE ICASSP 2005
➠ ➡