1 Incoherence of the Signal-Based Passive Speech Quality Estimation Models Adil Raja*, Colin Flanagan Abstract—Over the recent past there has been a sig- niﬁcant improvement in devising perceptual models for passively estimating the speech offered by telecommunica- tion networks. Telephony networks, on the other hand, are being adapted to support voice over IP. To date, listening quality of voice over IP remains dominated by packet loss. In this paper we claim that passive signal-based speech quality estimation models are inherently incapable of capturing a packet loss event. We support our claim in the light of the underlying DSP-based techniques. Index Terms—speech quality, genetic algorithms, genetic programming, symbolic regression. I. I NTRODUCTION Speech quality estimation is vital to the evalu- ation of quality of service offered by a telecom- munication network. Traditionally, speech quality is estimated using subjective tests. In subjective tests, the quality of a speech signal under test is evaluated by a group of human listeners who assign an opinion score on an integral scale ranging between 1 (bad) to 5 (excellent). The average of these scores, termed The authors are with the Department of Electroninc and Compuer Engineering, University of Limerick, Ireland e-mail: adil.raja—colin.ﬂanagan@ul.ie SPLEDICS: SPE-ANAL – Speech coding, synthesis and analysis the Mean Opinion Score (MOS), is considered as the ultimate determinant of the speech quality [1]. Subjective tests are, however, time consuming and expensive. To make up for these limitations, there has been a growing interest in devising software based objective assessment models. There are, fur- ther, two kinds of objective assessment models, namely, intrusive and non-intrusive. Intrusive mod- els evaluate the quality of a distorted speech signal in presence of a corresponding reference signal. The current ITU-T recommendation P.862 (PESQ) is an example of such an approach. Non-intrusive models, on the other hand, do not enjoy this privilege and base their results, instead, solely on the estimated features of the signal under test. For this reason, the results of the latter type of models are generally considered inferior to the those of the former type. Non-intrusive models can further be classiﬁed either as signal-based models or the parametric ones. The signal-based models process the distorted speech signal by various techniques such as model- ing the human speech production system, auditory signal processing and/or other waveform processing