Proceedings of 2009 12
th
International Conference on Computer and Information Technology (ICCIT 2009)
21-23 December, 2009, Dhaka, Bangladesh
Novel Objective Criteria for Perceptual Separation of Two
Kinds of Distortion in Speech Enhancement Applications
Md. Jahangir Alam, Douglas O'Shaughnessy, Sid-Ahmed Selouani
t
INRS-EMT, University of Quebec, Montreal QC, Canada
t University of Moncton, campus de shippigan, NB, Canada
alam@emt.inrs.ca, dougo@emt.inrs.ca, selouani@umcs.ca
Abstract
There is an increasing interest in the development of
robust quantitative speech quality measures that corre-
late well with subjective measures. This paper presents
two objective criteria-the Perceptual Signal to Audible
Noise Ratio (PSANR) and the Perceptual Signal to Aud-
ible Distortion Ratio (PSADR) , to characterize the two
kinds of degradation (i.e., residual background noise,
speech distortion or both) in speech enhancement ap-
plications. For performance evaluation of speech en-
hancement algorithms it is necessary to determine with
accuracy the kind of degradation present in the en-
hanced signal. Experimental results for speech en-
hancement using different well-known approaches de-
pict the usefulness of the proposed objective criteria.
speech quality measures can be classified according to
the perceptual domain transformation module being
used, and these are:
Time domain measures
Spectral domain measures and
Perceptual domain measures
Perceptual domain measures are shown to have the best
chance of predicting subjective quality of speech and
other audio signals since they are based on the human
auditory perception models.
Speech Quality
Measures
I \
(such as MOS,:/ 1
Time Domain Spectral Domain Perceptual Domain
(1)
(such as PESQ)
Objective Subjective
where E denotes the time, frequency or perceptual do-
main, x and y denote the original speech and observed
speech altered by noise or denoised speech after
processing, respectively, and c is the score of the objec-
tive measure. Mathematically, C is not a bijection from
E
2
to 1R.. It means that it is possible to find a signal y'
which is perceptually different from y but has the same
score than the one obtained with y ( c( x, y) = c( x, y') ).
The assessment of the denoised speech quality by
means of two parameters permits to overcome the prob-
lem of non bijection of classic objective evaluation and
to better characterize each kind of speech degradation.
(such as (such as Log
Segmental Spectral
SNR) Distance)
Figure 1. Classification of speech quality measures.
The common point of all objective criteria is their abili-
ty of evaluating speech quality using a single parameter
which embeds all kind of degradations after any
processing. Indeed, speech quality measures are basing
their evaluation on both original and degraded speeches
according to the following application
c. E
2
I. INTRODUCTION
Quality assessment of the processed speech signal can
be done using subjective listening tests or objective
quality measures as shown in figure 1. Subjective listen-
ing tests such as Mean Opinion Score (MaS) or Degra-
dation MaS (DMOS) provide perhaps the most reliable
method for assessing speech quality. Subjective evalua-
tion involves comparisons of original and processed
speech signals by a group of listeners who are asked to
rate the quality of speech signal along a pre-determined
scale. These tests, however, can be time consuming,
requiring in most cases access to the trained listeners.
For these reasons, several researchers have investigated
the possibility of devising objective, rather than subjec-
tive, measures of speech quality [5, 8-11].
The aim of the objective speech quality measures is to
achieve high correlation with subjective speech quality
measures such as Mean Opinion Score (MaS), or De-
gradation MaS (DMOS). An ideal objective speech
quality measure would be able to assess the quality of
the degraded or processed speech by simply observing
the speech in question, without accessing the original
speech. Much progress has been done in developing
such an objective measure [5, 8-11]. Current objective
measures are limited in that most require access to the
original speech signal and some can only model the
low-level processing (e.g., masking effects) of the audi-
tory system. Yet, despite these limitations, some of
these objective measures have been found to correlate
well with subjective listening tests [11]. Objective
Keywords: speech enhancement, masking threshold,
objective quality measure, PSANDR.
978-1-4244-6284-1/09/$26.00 ©2009 IEEE 483