Proceedings of 2009 12 th International Conference on Computer and Information Technology (ICCIT 2009) 21-23 December, 2009, Dhaka, Bangladesh Novel Objective Criteria for Perceptual Separation of Two Kinds of Distortion in Speech Enhancement Applications Md. Jahangir Alam, Douglas O'Shaughnessy, Sid-Ahmed Selouani t INRS-EMT, University of Quebec, Montreal QC, Canada t University of Moncton, campus de shippigan, NB, Canada alam@emt.inrs.ca, dougo@emt.inrs.ca, selouani@umcs.ca Abstract There is an increasing interest in the development of robust quantitative speech quality measures that corre- late well with subjective measures. This paper presents two objective criteria-the Perceptual Signal to Audible Noise Ratio (PSANR) and the Perceptual Signal to Aud- ible Distortion Ratio (PSADR) , to characterize the two kinds of degradation (i.e., residual background noise, speech distortion or both) in speech enhancement ap- plications. For performance evaluation of speech en- hancement algorithms it is necessary to determine with accuracy the kind of degradation present in the en- hanced signal. Experimental results for speech en- hancement using different well-known approaches de- pict the usefulness of the proposed objective criteria. speech quality measures can be classified according to the perceptual domain transformation module being used, and these are: Time domain measures Spectral domain measures and Perceptual domain measures Perceptual domain measures are shown to have the best chance of predicting subjective quality of speech and other audio signals since they are based on the human auditory perception models. Speech Quality Measures I \ (such as MOS,:/ 1 Time Domain Spectral Domain Perceptual Domain (1) (such as PESQ) Objective Subjective where E denotes the time, frequency or perceptual do- main, x and y denote the original speech and observed speech altered by noise or denoised speech after processing, respectively, and c is the score of the objec- tive measure. Mathematically, C is not a bijection from E 2 to 1R.. It means that it is possible to find a signal y' which is perceptually different from y but has the same score than the one obtained with y ( c( x, y) = c( x, y') ). The assessment of the denoised speech quality by means of two parameters permits to overcome the prob- lem of non bijection of classic objective evaluation and to better characterize each kind of speech degradation. (such as (such as Log Segmental Spectral SNR) Distance) Figure 1. Classification of speech quality measures. The common point of all objective criteria is their abili- ty of evaluating speech quality using a single parameter which embeds all kind of degradations after any processing. Indeed, speech quality measures are basing their evaluation on both original and degraded speeches according to the following application c. E 2 I. INTRODUCTION Quality assessment of the processed speech signal can be done using subjective listening tests or objective quality measures as shown in figure 1. Subjective listen- ing tests such as Mean Opinion Score (MaS) or Degra- dation MaS (DMOS) provide perhaps the most reliable method for assessing speech quality. Subjective evalua- tion involves comparisons of original and processed speech signals by a group of listeners who are asked to rate the quality of speech signal along a pre-determined scale. These tests, however, can be time consuming, requiring in most cases access to the trained listeners. For these reasons, several researchers have investigated the possibility of devising objective, rather than subjec- tive, measures of speech quality [5, 8-11]. The aim of the objective speech quality measures is to achieve high correlation with subjective speech quality measures such as Mean Opinion Score (MaS), or De- gradation MaS (DMOS). An ideal objective speech quality measure would be able to assess the quality of the degraded or processed speech by simply observing the speech in question, without accessing the original speech. Much progress has been done in developing such an objective measure [5, 8-11]. Current objective measures are limited in that most require access to the original speech signal and some can only model the low-level processing (e.g., masking effects) of the audi- tory system. Yet, despite these limitations, some of these objective measures have been found to correlate well with subjective listening tests [11]. Objective Keywords: speech enhancement, masking threshold, objective quality measure, PSANDR. 978-1-4244-6284-1/09/$26.00 ©2009 IEEE 483